Ch13 Checkpointing and Recovery
-
Upload
jonah-mckenzie -
Category
Documents
-
view
67 -
download
6
description
Transcript of Ch13 Checkpointing and Recovery
Ch13Checkpointing and Recovery
Outline
Introduction What ? Why? Where? Problems in Rollback Incarnation numbers Taxonomy of solution techniques Uncoordinated checkpoint Coordinated checkpoint Synchronous Logging Asynchronous Logging Adaptive Logging
Checkpointing and Recovery
Introduction During a computation, a node might fail and then be repaired After a failed processor has been repaired, how to take the system to a consistent global state?
If every processor periodically : records its local state on stable storage, records messages received on stable storage Then One can take the system to a consistent global state by rolling back the system to a previously recorded global state
Terminology checkpointing : record state in a stable storage log received messages : record received messages on a stable storage
Checkpointing and Recovery
Recovery line A set C of local checkpoints forms a consistent state (also called recovery line) if the following conditions are satisfied: 1) there are no lost messages in C 2) there are no orphan messages in C 3) C contains exactly one checkpoint for each processor
Checkpointing and Recovery
Problems in rollback Goal of rollback is to roll back the system to a consistent state
Some precautions have to be taken for this to work properly
For simplicity, we do not consider channel state for the rollback
To see the problem, assume: 1) processors checkpoint from time to time 2) checkpoints are established independently without any coordination between themselves
Checkpointing and Recovery
Problems in rollback To see the problem, assume: 1) processors checkpoint periodically 2) checkpoints are established independently without any coordination between themselves
p1 p2 p3
m2m3
m1
c1
c2 c3 The global state formed by c1,c2,c3 is inconsistent it contains:lost messages: m2, m3orphan messages: m1
Checkpointing and Recovery Problems in rollback : cascading rollbacks
p2
q2
q1
p3
q3
r1
q4
r2
p1
r3
r4
p q rpp3
rr4
m1
m1
qq4
pp2
m3
m2
m2m3
m4
m4
qq3
m5
m5rr3
“p rolls back to p3”requires , because ofmessage m1 that “r rolls back to r4”
...
{p2,q3,r3} is a recovery line
A rollback by a processor can causean avalanche of rollbacks
How to avoid this ?
Checkpointing and Recovery Problems in rollback : I/O stuttering
I/O
p q r
pi
Rolling back processor p to pi requires that the I/O event be re-executed: I/O stutteringHow can we avoid this ?
Log inputs: avoid input stutteringOutput commit: avoid output stuttering
Checkpointing and Recovery Problems in rollback : messages duplication
p q
pi
Rollback(p)
m
pi
m
r(m)
r(m)p
q
After recovery, processor p sends m again. Processor q should recognize that message m is a duplicate message
After p recovers
Processor p rolls back to pi No need for q to roll back
r(m)
Checkpointing and Recovery Incarnation numbers: handling duplicate messages
Every processor: maintains an incarnation number on a stable storage
stores a guess of the incarnation number of every other processor
On every recovery from failure or rollback, the incarnation number is incremented;
Each message carries the incarnation number of the sender
Checkpointing and Recovery Incarnation numbers: handling duplicate messages
0 1 2
Recoveryfrom failure Rollback
[ period 0 [ period 1 [
When processor p receives a message m from processor q, processor p behaves as follows:if m.incarnation < incarnation[q]: message m is a duplicate, discard itif = : deliver mif > : m belongs to an incarnation that p don’t know yet, so block the delivery of m until m.incarnation=incarnation[q]
Evolution of a processor is organized into periods. Incarnations numbers serve to identify these periods
Checkpointing and Recovery Choices to be made to implement a recovery scheme To log or not to log messages ? Log messages: + : increases flexibility at the recovery time - : expensive (space) processes must be deterministic (which is not often the case)
Checkpointing and Recovery Choices to be made to implement a recovery scheme To coordinated or not to coordinated recording state? Uncoordinated checkpoints Sufficient information (we’ll see later) must be kept for rollback
+ : keeps the cost of establishing checkpoints low - : the amount of rollback may be unbounded
Coordinated checkpoints The set of checkpoints together form a recovery line
+ : limits the amount of rollback - : increases the cost of establishing checkpoints
Checkpointing and Recovery Uncoordinated checkpointing
Assumptions 1. Processors asynchronously checkpoint from time to time
2. No coordination between processors for establishment of checkpoints
3. No log of messages
Goal find a maximal recovery line (latest recovery line) i.e the one that happens after every other possible recovery line
Checkpointing and Recovery Uncoordinated checkpointingCheckpoint interval algorithm (progressive rollback) Notations Ci,j : the jth checkpoint at processor pi Ii,j : the interval ] Ci,j ; Ci,j+1[, processing interval of pi between Ci,j and Ci,j+1
Definition Ik,l depends on Ii,j iff there is a message m sent in Ii,j and received in Ik,lpi pk
m Ck,l
Ck,l+1
Ci,j
Ci,j+1
Checkpointing and Recovery Uncoordinated checkpointingCheckpoint interval algorithm (progressive rollback) Idea of the algorithm When a processor pi fails and then is repaired 1. Processor pi initiates recovery by restoring its last checkpoint, say Ci,j
2. Every processor pk in Ik,l such that Ik,l depends on Ii,j rolls back (but to which checkpoint ? We’ll see later)
3. This process continues recursively (transitively) until a recovery line is determined
To support recovery, the information about interval dependence must be recorded (This is the sufficient information !)
Checkpointing and Recovery Uncoordinated checkpointingInterval dependence graph: to capture rollback requirements GI is a graph in which VI: vertices are checkpoint intervals that exist when recovery starts EI: directed edges such that 1). for every processor pi, (Ii,j , Ii,j+1) is in EI
2). If Ik,l depends on Ii,j then (Ii,j , Ik,l) is added to EI
Ii,j
Ii,j+1
If then
Ii,j
If
Ik,l
in GI
Ii,j
Ii,j+1
in GI
Ii,j
Ik,k+1
then
Checkpointing and Recovery Uncoordinated checkpointing
Intuition behind interval dependence graph: If processor pi rolls back to Ci,j and Ik,l depends on Ii,j
then processor pk must roll back to Ck,,l
This, to avoid orphan messages
Ii,j
If
Ik,l
thenand
Ci,jpi Ck,l
pk
m
Because of m
Checkpointing and Recovery Uncoordinated checkpointingInterval dependence graph illustrated:
p1 p2 p3
I1,1 I2,1
I1,2
I1,3
I3,1
I2,3
I2,2
I1,4
I3,3
I3,4
I3,2
1,1
1,2
3,3
2,3
1,3
3,2
3,1
1,4
2,2
2,1
3,4
Message passing and checkpoiting Interval dependence graph
m2
m1
m3m4
m5
Checkpointing and Recovery Uncoordinated checkpointingThe checkpoint interval algorithm (progressive rollback)When a processor pi fails and then is repaired, then pi performs
Step 1. Compute GI
Step 2. Mark the node of GI corresponding to its last checkpoint interval; Let Ii,j be that node. Mark all the nodes of GI that are reachable from Ii,j Step 3. Define for each processor k, the “best checkpoint” of k w.r.t. recovery of pi to be : Ck,l such that l = min {j | Ik,j is marked} every processor rolls back to its “best checkpoint”
Checkpointing and Recovery Uncoordinated checkpointingThe algorithm illustrated: assume that p2 fails and then is repaired
1,1
1,2
3,3
2,3
1,3
3,2
3,1
1,4
2,2
2,1
3,4
Interval dependence graph
Step 1. p2 computes GI
Checkpointing and Recovery Uncoordinated checkpointingThe algorithm illustrated: assume that p2 fails and then is repaired
1,1
1,2
3,3
2,3
1,3
3,2
3,1
1,4
2,2
2,1
3,4
Interval dependence graph
Step 2. p2 marks all the nodes of GI
reachable from its last checkpoint interval
Recall: for each processor kthe “best checkpoint” of k w.r.t.recovery of p2 is Ck,l such that l = min {j | Ik,j is marked}
Checkpointing and Recovery Uncoordinated checkpointingThe algorithm illustrated: assume that p2 fails and then is repaired
Step 3. Each processor rolls back to its “best checkpoint” w.r.t. Recovery of p2
Recall: for processor kthe “best checkpoint” of k w.r.t.recovery of p2 is Ck,l such that l = min {j | Ik,j is marked}
p1 p2 p3
I1,1 I2,1
I1,2
I1,3
I3,1
I2,3
I2,2
I1,4
I3,3
I3,4
I3,2
The recovery line determined
m2
m1
m3m4
m5
Checkpointing and Recovery Uncoordinated checkpointingSome comments about the checkpoint interval algorithm
Rollback can take the system to the initial state
The algorithm presented is a centralized algorithm can be implemented on a recovery manager that directs all the participants to restart, each from its “best checkpoint” For a distributed version, recovery control messages are must be used to communicate parts of GI
Checkpointing and Recovery Coordinated checkpointing
Idea: Processors coordinate the checkpointing of their local statesto ensure that the checkpoints taken by the different processors form a recovery line This avoid cascading rollback
Method used: Similar to that used for computing a “global snapshot” However, there are some differences
Checkpointing and Recovery Coordinated checkpointingSubtleties: 1. Only processor states are recorded (save space)
2. Failures during checkpointing are handled
3. Store the minimum number of checkpoints (save space)
4. Lost messages are handled by the communication protocol (a consistent set of checkpoints may now contain lost messages)
5. No orphan messages in the computed set of checkpoints
Checkpointing and Recovery Coordinated checkpointingSubtleties (cont.):
6. Only a minimum number of processors must checkpoint idea: old checkpoints together with new checkpoints of some processors may form a “consistent set” of checkpoints
Checkpointing and Recovery Coordinated checkpointingKoo & Toueg 87 (the original algorithm): Uses a two-phase protocol to ensure that either all processors checkpoint or none do
Two types of checkpoints are used for that
“tentative checkpoint” : established when global state recording is ongoing
“permanent checkpoint” : if the recorded state is consistent, tentative checkpoints become permanent checkpoints
Checkpointing and Recovery Coordinated checkpointing: Koo & Toueg 87 (the original algorithm)
Basic idea Phase 1 Initiator q: 1. an initiator processor q takes a tentative checkpoint; 2. q requests all other processors to take tentative checkpoints Non-initiator p: on receiving this request 1. p establish/ not establish the tentative checkpoint; 2. p sends its decision to the initiator; 3. p waits for the final decision from q (i.e. refrains from any communication with any other until the second phase is over)
Checkpointing and Recovery Coordinated checkpointing: Koo & Toueg 87 (the original algorithm)
Basic idea (cont.) Phase 2 : Initiator q: 1. Processor q collects decisions from all other processors 2. If all other processors have taken tentative checkpoints then q makes its tentative checkpoint permanent; else q undo its tentative checkpoint; 3. q requests all others to perform the same final decision Non-initiator p: on receiving this final decision processor p executes the order;
Checkpointing and Recovery Coordinated checkpointing: Koo & Toueg 87 (the original algorithm)
The Basic idea ensures that there are no orphan messages Why?
Checkpointing and Recovery Coordinated checkpointing: Koo & Toueg 87 (the original algorithm)
The Basic idea ensures that there are no orphan messages Why? Answer: no communication is allowed until the second phase is over
Checkpointing and Recovery Coordinated checkpointing: Koo & Toueg 87 (the original algorithm)
It is not necessary that all processors record their state during checkpointing
Why ?
Checkpointing and Recovery Coordinated checkpointing: Koo & Toueg 87 (the original algorithm)
It is not necessary that all processors record their state during checkpointing
Why ?
p1 p2 p3
C1,1
C1,2
C2,1
C2,2
C3,1
C3,2
p1 initiates checkpointing by establishing c1,1then p1 contacts p2, p3 sending red messages
assume that everything went fine and p2, p3 establishc2,2 and c3,2 respectively as new checkpoints
{c1,2 , c2,2 , c3,2} form a consistent set of checkpoints
However, {c1,2 , c2,1 , c3,2}also form a consistent set of checkpoints (i.e. no orphan messages) Hence, processor p2 need not take a new checkpoint
Checkpointing and Recovery Coordinated checkpointing: Koo & Toueg 87 (the original algorithm)
Ensuring a minimum number of checkpoints: Every processor assigns monotonically increasing sequence numbers to each message it sends
Each processor p uses: p.last_rec[1..M] an array of sequence numbers p.last_rec[i] = sequence number of the last message that processor p received from processor pi since p’s last checkpoint
p.first_sent[1..M] an array of sequence numbers p.first_sent[i] = sequence number of the first message that processor p sent to processor pi since p’s last checkpoint
Checkpointing and Recovery Coordinated checkpointing: Koo & Toueg 87 (the original algorithm)
Ensuring a minimum number of checkpoints: When an initiator processor q requests a processor p to take a tentative checkpoint, processor q appends q.last_rec[p] to its request
On receiving this request from q, processor p takes the tentative checkpoint only if (p.first_sent[q] q.last_rec[p])
q
Current checkpoint of q
p
p takes a new checkpoint only in this case avoid orphan messages
Last checkpoint of qLast checkpoint of p
Checkpointing and Recovery Coordinated checkpointing: Koo & Toueg 87 (the original algorithm)
Ensuring a minimum number of checkpoints (cont.) Only processors that have sent messages to the initiator processor q since q’s last checkpoint need to consider the establishment of a new checkpoint requested by q
an initiator processor q should send requests only to those processors p such that :
q
Current checkpoint of q
p
Last checkpoint of q
Checkpointing and Recovery Coordinated checkpointing: Koo & Toueg 87 (the original algorithm)
Ensuring a minimum number of checkpoints (cont.) Every processor q maintains: q.checkpoint_cohort : a set that contains those processors from which q has received some messages since q’s last chekpoint
i.e. q.checkpoint_cohort stores processors p such that:
q
Current checkpoint of q
p
Last checkpoint of q
Checkpointing and Recovery Coordinated checkpointing: Koo & Toueg 87 (the original algorithm)
The algorithm
Phase 1 Initiator processor q: 1. Take tentative checkpoint; 2. for every processor p in q.checkpoint_cohort do send (Request_tentative_chkp; q.last_rec[p]) to p;
Checkpointing and Recovery Coordinated checkpointing: Koo & Toueg 87 (the original algorithm)
The algorithm Phase 1: Non-initiator processor p: On receiving “Request_tentative_chkp; q.last_rec[p]” from q if (ready to perform tentative checkpoint) and (p.first_sent[q] q.last_rec[p]) then take tentative checkpoint; for every processor r in p.checkpoint_cohort do send (Request_tentative_chkp; p.last_rec[r]) to r; p.replies := empty; for every processor r in p.checkpoint_cohort do wait until r sends “OK” or “KO” , Timeout=T; on “OK” : add r to p.replies; /* set of replies */ If p.replies p.checkpoint_cohort then send “KO” to q else send “OK” to q
Checkpointing and Recovery Coordinated checkpointing: Koo & Toueg 87 (the original algorithm)
The algorithm Phase 2 Initiator processor q: 1. q.replies := empty; 2. for every processor p in q.checkpoint_cohort do wait until p sends “OK” or “KO” , Timeout=T; on “OK” : add p to q.replies; /* set of replies */ if q.replies q.checkpoint_cohort then undo tentative; send “undo tentative checkpoint” to every processor in q.checkpoint_cohort else permanent := tentative; send “make tentative checkpoint permanent” to every processor in q.checkpoint_cohort
Checkpointing and Recovery Coordinated checkpointing: Koo & Toueg 87 (the original algorithm)
The algorithm
Phase 2 Non-initiator processor p: wait until q sends “undo …” or “make … permanent”; timeout = T on “undo …” do undo tentative checkpoint end on “make … permanent” do checkpoint : =tentative_checkpoint end if no timeout then m := message received;
for every processor r in p.checkpoint_cohort do send m to r;
Checkpointing and Recovery Coordinated checkpointing: Koo & Toueg 87 (the original algorithm)
Handling failures idea:
Failures are detected by timeouts;
On recovery, if the recovering processor was the initiator, it undoes its tentative checkpoint and sends this decision to the other processors else the recovered processor consults the initiator oe some other processor to find the final decision
Checkpointing and Recovery Logging Idea: Processors record incoming messages Purpose: avoid need of “resending” reduce the amount of rollback (idea of virtual checkpoint)
Log messages
Virtual checkpoint
+ flexibility- expensive
Checkpointing and Recovery Synchronous Logging Idea Each message must be logged before it can be delivered During recovery, logged messages are replayed until the recovering processor is up to date (guarantee of replay after all sends that can cause subsequent rollback) Problem : expensive
Checkpointing and Recovery Asynchronous Logging Idea Each message must be logged but not necessarily before it can be delivered Messages can be first saved in main memory
Exploit idle period to log messages
several messages can be packed together then logged simultaneously (efficient used of I/O devices)
Problem some messages may be lost not always possible to replay