Download - Uniprocessor Checkpointing CS 717 – Fall 2001 9/25/01.

Uniprocessor Checkpointing

CS 717 – Fall 2001

9/25/01

The Need to Save State Many of the FT systems we have discussed

need a way to restart processes from previous points in their computation

A checkpoint is just a ‘snapshot’ of a process (or system) at a certain point in time

A checkpointing system provides a way to take these snapshots, and to restart from them

Types of Ckpt Systems Kernel Level

OS supports ckpt & recovery Transparent to the application and developer

User Level Application linked against (user) library

Library functions perform ckpt and recovery Transparent to application Limitations (cannot restore PID, PPID, etc.)

Application Level Applications coded to ckpt themselves, and to

restart from a checkpoint

Comparison of Levels Kernel & User (System) Level

Easy to add checkpointing to existing code

Works with (almost) any programs General, ‘coarse’, approach

Application Level Could require complete re-write, or

extensive modifications Specific, ‘fine-grained’ solutions

System Level Checkpointing

Libckpt (1994) Plank, Beck, Kingsley (UTK), Li

(Princeton)

User level library for UNIX

Libckpt User Level Checkpoint Library Goals

Transparent Requires minimal modifications to code

and re-re-linking Low Overhead

Automatic optimizations to reduce ckpt file size

Allow user directed checkpointing

Libckpt Overview Taking the ‘snapshot’

Suspend the process Write process’ memory and registers

to a file Recovery

Reload executable from original file Reconstruct memory and register

state from checkpoint file

Libckpt Operation Application main() is re-named

ckpt_target() Library main() checks if in restore

mode (specified using command line option); otherwise reads checkpoint parameters from file

Libckpt Operation (2) main() sets a timer to interrupt

application every n seconds On signal

Uses setjmp to record registers, pc, etc.

Writes the stack and heap segments to file

Resumes application code

Libckpt Operation If application started with =recover

as command line option Application begins, recovering Text

segments Open checkpoint file Recover heap from file Recover stack from file Restores register file (using longjmp)

Virtual Address Space

Text

Data (Static)

Stack

Heap

SP

Bottom of Stack

sbrk(0)

&etext

&edata

0

Checkpoint And Recovery Algorithms

main()if(recovery)

restore stackrestore heappos = top of

stacklongjmp(pos, 1)// restore regs.

elserun usual code

signal_handler()jmp_buf posif(setjmp(pos)==0)//saved reg. in known //position on stack

write stackwrite heap

else// process recovered

return

Illustration

main()

user_main()fun1() fun2() signal save regs on

stack save stack to file save heap to file resume

main() restore()

restore stackrestore heap

take jump

Optimization: Incremental Checkpointing

Observation: between taking two checkpoints, only a portion of the memory has actually been changed

Optimization: save only what has been changed since last ckpt, the rest can be read from previous ckpts

Taking Incremental Ckpts. After taking a ckpt (and after init.), set

protection on all pages to ‘read-only’ Write to page will cause a protection

violation Libckpt library catches that signal, and

sets page protection to ‘read-write’, page is marked as dirty

When writing checkpoint file, only write dirty pages

Drawbacks to Incremental Ckpt

Required to keep multiple copies of the checkpoint file

On recovery, will unnecessarily restore old copies of data

Optimization: Asynchronous Checkpointing

Observation: the process must be suspended while the checkpoint file is written

Optimization: a separate thread could write the checkpoint file while the main thread was allowed to continue

Asynchronous Checkpointing

Make a copy of the process space

2nd thread takes writes copy to disk

1st thread continues without halting

Asynchronous Checkpointing(2)

Unix fork() provides the necessary behavior

When about to take ckpt, process forks

OS makes a complete copy of the original process’ space

Clone writes ckpt file, then dies Original continues computing

Copy-On-Write Checkpointing

Like asynchronous checkpointing, but only copy page if the two versions are about to differ

Some (most?) OS implement fork() in this manner, so benefit is automatic

Checkpoint Compression Use a standard data compression

algorithm to shrink the size of the checkpoint file

Only improves overhead if the speed of compression is faster than the speed of disk writes, and compression is significant

“For uniprocessor checkpointing, this is not the case”

Not implemented in libckpt

User Directed Checkpointing

As described so far, libckpt is (almost) entirely transparent to the programmer

Compare to application level checkpoint requiring extensive code changes

Is there a middle ground? Libckpt allows programmers to

annotate application code with directives that guide the checkpointing

Memory Exclusion Certain areas of memory can be excluded

from the checkpoint Dead memory – will never be read or written Clean memory – values have not changed

since previous checkpoint Incremental Ckpt provides clean memory

opt. at a coarse level (page size) Only writing the ‘active’ areas of the stack

and heap provides dead memory opt.

User Directed Memory Exclusion

Libckpt provides the app. programer with two functions exclude_bytes(ptr, length, usage)

Specify an area of memory to exclude from future checkpoints

include_bytes(ptr, length) Add a previously excluded area of

memory to future checkpoints

Clean Memory If mem is clean

exclude_bytes(mem, …, CKPT_READONLY)

Include mem in next checkpoint, but exclude in all subsequent

Cannot write to mem until after call to include_bytes(mem)

Restore last saved version of mem

Clean Memory: Example

for (…){

A = init_A()exclude_bytes(A,…,CKPT_READONLY)do_stuff(A) //assuming A does not change

include_bytes(A…)}

Dead Memory If mem is dead

exclude_bytes(mem, …, CKPT_DEAD) Do not checkpoint mem Cannot read mem until after

include_bytes(mem) Will not restore mem

Dead Memory: Examplefor (…){

A = init_A()do_stuff(A)exclude_bytes(A…DEAD)do_other_stuff() // assumes will not read Ainclude_bytes(A)

}

Using Memory Exclusion There can be a dramatic reduction

in the size of the checkpoint file Must be used very carefully

Inadvertently excluding a live region from a checkpoint could cause erroneous behavior on restart

Synchronous Checkpointing

At different points in the program’s execution the amount of ‘live’ state varies widely The stack might be much smaller

(shallower call graph) Heap items might have been de-

allocated Regions of memory might be dead or

clean

Synchronous Ckpt (2) If checkpoints are taken at times

where there is relatively little live state, the checkpoint file size (and overhead) will be smaller

Allow user to specify where in a program a checkpoint should be taken

Independent of timers (signals)

Sync. Ckpt. Example

for (…){

checkpoint_here()A = malloc(…)do_stuff(A)free A

}

Synchronous Ckpt (3) To avoid checkpointing too

frequently, mintime parameter specifies the minimal amount of time between two checkpoints

If checkpoint_here() is called less than mintime seconds after the last checkpoints, the call is ignored

Synchronous Ckpt (4) To ensure that checkpoints are

taken frequently enough to be of use, maxtime parameter specifies the maximum time allowed to elapse between two checkpoints

If maxtime passes, an asynchronous checkpoint is taken

Combining Mem. Exclusion and Sync. Checkpointingmain(){

D = mallocf = filewhile(!done){

D = read(f)perform_calc(D)output_result()

}}

ckpt_target(){D = mallocf = filewhile(!done){ D = read(f) perform_calc(D) output_result() exclude_bytes(D, DEAD) checkpoint_here() include_bytes(D)}

}