Unit v Recovery and Security Mechanism

download Unit v Recovery and Security Mechanism

of 35

Transcript of Unit v Recovery and Security Mechanism

  • 8/2/2019 Unit v Recovery and Security Mechanism

    1/35

  • 8/2/2019 Unit v Recovery and Security Mechanism

    2/35

    Recovery refers to restoring a system to its normal

    operational state.

    For eg if the process fails, the resources allocated to the

    failed process be reclaimed.

    If one or more cooperating processes fails, then the

    efforts due to interaction of the failed processes with the

    other processes must be undone or every failed process

    would have to restart from an appropriate state.

    If the site fails, recovery in this case involves the

    question of how not to expose the system to data

    inconsistencies & bring back the failed site to an up to

    date state consistent with the rest of the system.

  • 8/2/2019 Unit v Recovery and Security Mechanism

    3/35

    System : h/w & s/w

    Failure : when the system does not perform itsservices in the manner specified.

    An Erroneous state of the system is a state whichcould lead to a system failure by a sequence of validstate transitions.

    A faultis an anomalous physical conditions.

    Cause of this includes design errors, manufacturing

    problems, damage, external disturbances. An error is the part of the system state of errors.

    Failure recovery is a process that involvesrestoring an erroneous state to an error free state.

  • 8/2/2019 Unit v Recovery and Security Mechanism

    4/35

    Figure An error is a manifestation of fault and can

    lead to failure.

  • 8/2/2019 Unit v Recovery and Security Mechanism

    5/35

    Process failure

    System failure

    Secondary storage failure

    Communication medium failureProcess failure:

    The computation results in an incorrect outcome,

    the process causes the system state to deviate from

    specifications, the process may fail to progress. Etc.

    Errors causing to fail a process : deadlocks,

    timeouts, protection violation, wrong input

    provided by the user, consistency violation.

  • 8/2/2019 Unit v Recovery and Security Mechanism

    6/35

    System failure:

    Occurs when processor fails to execute.

    It is caused by the s/w errors & h/w problems. In case of system failure, it is stopped & restarted

    from the correct state i.e predefined state.

    Classified into following:

    1) An amnesia failure occurs when a system restartsin a predefined state that does not depend upon

    the state of the system before its failure.

    2) A partial amnesia failure occurs when a system

    restarts in a state wherein a part of the state is

    same as the state before the failure & the rest of the

    state is predefined. Eg file server crashes.

  • 8/2/2019 Unit v Recovery and Security Mechanism

    7/35

    3) A pause failure occurs when a system restarts in

    a same state it was in before the failure.

    4) A halting failure occurs when a crashed systemnever restarts.

    Secondary storage failure

    When thestored data cannot be accessed.

    Cause : parity error, head crash, or dust particlessettled on the medium.

    Its contents are corrupted & must be reconstructed

    from archive systems & log files.

    Communication medium failure

    Occurs when a site can not communicate with

    another operational site in the network.

  • 8/2/2019 Unit v Recovery and Security Mechanism

    8/35

    Cause : failure of switching node includes system

    failure & secondary storage failure, & link failure

    includes physical rupture & noise in thecommunication channels.

    May not cause total shutdown of the system.

  • 8/2/2019 Unit v Recovery and Security Mechanism

    9/35

    Error is that part of the state that differs from itsintended value and can lead to a system failure, andfailure recovery is a process that involves restoring anerroneous state to an error-free state.

    Two approaches for restoring an erroneous state to anerror free state.

    If the nature of errors and damages caused by faults canbe completely and accurately assessed, then it ispossible to remove those errors in the process's

    (system's) state and enable the process (system) tomove forward. This technique is known asforward-error recovery.

  • 8/2/2019 Unit v Recovery and Security Mechanism

    10/35

    If it is not possible to foresee the nature of faults and

    to remove all the errors in the process's (system's)

    state, then the process's (system's) state can berestored to a previous error-free state of the process

    (system). This technique is known as backward-error

    recovery.

    backward-error recovery is simpler than forward-

    error recovery as it is independent of the fault and the

    errors caused by the fault.

    Problem with backward-error recovery :

  • 8/2/2019 Unit v Recovery and Security Mechanism

    11/35

    Performance penalty: The overhead to restore a process

    (system) state to a prior state can be quite high.

    There is no guarantee that faults will not occur again whenprocessing begins from a prior state.

    Some component of the system state may be

    unrecoverable. For example, cash dispensed at an

    automatic teller machine cannot be recovered.

    The forward-error recovery technique, on the other

    hand, incurs less overhead because only those parts

    of the state that deviate from the intended value

    need to be corrected

  • 8/2/2019 Unit v Recovery and Security Mechanism

    12/35

    In backward-error recovery, a process is restored to a

    prior state in the hope that the prior state is free of

    errors.

    The points in the execution of a process to which the

    process can later be restored are known as recoverypoints.

    Recovery done at the process level is simply a subset of

    the actions necessary to recover the entire system.

    In a system recovery, all the user processes that wereactive need to be restored to their respective recovery

    points and data (in secondary storage) modified by the

    processes need to be restored to a proper state.

  • 8/2/2019 Unit v Recovery and Security Mechanism

    13/35

    There are two ways to implement

    the operation based approach and

    the state-based approach

    System Model

  • 8/2/2019 Unit v Recovery and Security Mechanism

    14/35

    audit trail or a log: state of a process are recorded

    in sufficient detail so that a previous state of the

    process can be restored by reversing all the changes

    made to the state.

    UPDATING-IN-PLACE. : every update (write) is to be

    recorded in log file & stable storage.

    The information recorded includes:

    1) the name of the object,2) the old state of the object (used for UNDO), and

    3) the new state of the object (used for REDO).

  • 8/2/2019 Unit v Recovery and Security Mechanism

    15/35

    A recoverable update operation can be implemented

    as a collection of operations as follows:

    1) A do operation, which does the action (update) andwrites a log record.

    2) An undo operation, which, given a log record written

    by a do operation, undoes the action performed by

    the do operation.3) A redo operation, which, given a log record written

    by a do operation, redoes the action specified by

    the do operation.

    4) An optional display operation, which displays the log

    record.

  • 8/2/2019 Unit v Recovery and Security Mechanism

    16/35

    The major problem with the updating-in-

    place is that a do operation cannot be

    undone if the system crashes after anupdate operation but before the log record

    is stored.

    This problem is overcome by the write-ahead-log protocol

  • 8/2/2019 Unit v Recovery and Security Mechanism

    17/35

    the complete state of a process is saved

    when a recovery point is established and

    recovering a process involves reinstating its

    saved state and resuming the execution ofthe process from that state.

  • 8/2/2019 Unit v Recovery and Security Mechanism

    18/35

    The process of saving state is also referred

    to as checkpointing or taking acheckpoint.

    The recoverypoint at which checkpointing

    occurs is often referred to as a checkpoint.

    The process of restoring a process to a

    prior-state is referred to as rolling back the

    process.

    A special case of the state-based recovery

    approach is the technique based on shadow

    pages.

  • 8/2/2019 Unit v Recovery and Security Mechanism

    19/35

    if one of the cooperating processes fails and

    resumes execution from a recovery point; thenthe effects it has caused at other processes due

    to the information it has exchanged with them

    after establishing the recovery point will haveto be undone.

    To undo the effects caused by a failed process

    at an active process, the active process must

    also rollback to an earlier state.

    Thus, in concurrent systems, all cooperating

    processes need to establish recovery points.

  • 8/2/2019 Unit v Recovery and Security Mechanism

    20/35

    Rolling back of processes can cause further

    problems:

    Orphan Messages and the Domino Effect

    Lost messages

    Problem of Livelocks

  • 8/2/2019 Unit v Recovery and Security Mechanism

    21/35

  • 8/2/2019 Unit v Recovery and Security Mechanism

    22/35

  • 8/2/2019 Unit v Recovery and Security Mechanism

    23/35

    In distributed systems involves taking acheckpoint by all the processes (sites) or at least

    by a set of processes (sites) that interact with one

    another in performing a distributed computation.

    Typically, in distributed systems, all the sites save

    their local states, which are known as local

    checkpoints, and the process of saving local states is

    called local check pointing. All the local checkpoints, one from each site,

    collectively form aglobal checkpoint

  • 8/2/2019 Unit v Recovery and Security Mechanism

    24/35

    STRONGLY CONSISTENT SET OF CHECKPOINTS :

    To overcome the domino effect, a set of localcheckpoints is needed (one for each process

    in the set) such that no information flow

    takes place (i.e., no orphan messages)between any pair of processes in the set, as

    well as between any process in the set and

    any process outside the set during the

    interval spanned by the checkpoints.

  • 8/2/2019 Unit v Recovery and Security Mechanism

    25/35

    Such a set of checkpoints is known as a recovery line

    or a strongly consistent set of checkpoints.

  • 8/2/2019 Unit v Recovery and Security Mechanism

    26/35

    CONSISTENT SET OF CHECKPOINTS:

    A consistent set of ckeckpoints is similar to a

    consistent global state in that it requires that each

    message recorded as received in a checkpoint

    (state) should also be recorded as sent in another

    checkpoint (state).

    Therefore, systems that do not establish a strongly

    consistent set of checkpoints have to deal with lost

    messages during roll back recovery.

  • 8/2/2019 Unit v Recovery and Security Mechanism

    27/35

    checkpointing and recovery technique proposedby Koo and Toueg that takes a consistent set of

    checkpoints and avoids livelock problems during

    recovery.

    The algorithm's approach is said to be

    synchronous, as the processes involved coordinate

    their local checkpointing actions such that the set

    of all recent checkpoints in the system isguaranteed to be consistent.

  • 8/2/2019 Unit v Recovery and Security Mechanism

    28/35

    The checkpoint algorithm assumes the followingcharacteristics for the distributed system:

    Processes communicate by exchanging messages

    through communication channels

    Channels are FIFO in nature.

    Communication failures do not partition the network.

    The checkpoint algorithm takes two kinds of

    checkpoints on stable storage, permanent andtentative.

    A permanent checkpoint is a local checkpoint at a

    process and is a part of a consistent global checkpoint.

  • 8/2/2019 Unit v Recovery and Security Mechanism

    29/35

    A tentative checkpoint is a temporary checkpoint

    that is made a permanent checkpoint on the

    successful termination of the checkpoint

    algorithm. Processes roll back only to their permanent

    checkpoint.

    The algorithm has two phases. First Phase. :An initiating process Pi takes a

    tentative checkpoint and requests all the processes

    to take tentative checkpoints.

    Each process informs Pi whether it succeeded intaking a tentative checkpoint.

    A process says "no" to a request if it fails to take a

    checkpoint, which could be due to several reasons,

    depending upon the underlying application.

  • 8/2/2019 Unit v Recovery and Security Mechanism

    30/35

    IfPi learns that all the processes have successfully

    taken tentative checkpoints, Pi decides that all

    tentative checkpoints should be made permanent;

    otherwise, Pi decides. that all the tentativecheckpoints should be discarded.

    Second Phase:Pi informs all the processes of the

    decision it reached at the end of the first phase.

    A process, on receiving the message from Pi, will

    act accordingly.

    Therefore, either all or none of the processes take

    permanent checkpoints. The algorithm requires that every process, once it

    has taken a tentative checkpoint not send

    messages related to the underlying computation

    until it is informed ofPiSdecision.

  • 8/2/2019 Unit v Recovery and Security Mechanism

    31/35

    The rollback recovery algorithm assumes that asingle process invokes the algorithm, as opposed to

    several processes concurrently invoking it to

    rollback and recover.

    It also assumes that the checkpoint and the rollback

    recovery algorithms are not concurrently invoked.

    The rollback recovery algorithm has two phases.

    First Phase. An initiating process Pi checks to see ifall the processes are willing to restart from their

    previous checkpoints.

  • 8/2/2019 Unit v Recovery and Security Mechanism

    32/35

    A process may reply "no" to a restart request if it is

    already participating in a checkpointing or a

    recovering process initiated by some other

    process.

    Second Phase. Pi propagates its decision to all the

    processes. On receiving PiS decision, a process

    will act accordingly. The recovery algorithm requires that every

    process not send messages related to the

    underlying computation while it is waiting for Pis

    decision .

  • 8/2/2019 Unit v Recovery and Security Mechanism

    33/35

    synchronous checkpointing simplifies recovery(because a consistent set of checkpoints is readilyavailable), it has the following disadvantages:

    1) Additional messages are exchanged by the

    checkpoint algorithm when it takes eachcheckpoint.

    2) Synchronization delays are introduced duringnormal operations.

    3) If failures rarely occur between successivecheckpoints, then the synchronous approachplaces unnecessary burden on the system in theform of additional messages, delays, and

    processing overhead.

  • 8/2/2019 Unit v Recovery and Security Mechanism

    34/35

    To minimize the amount of computation undone

    during a roll back, all incoming messages are

    logged (stored on stable storage) at each

    processor.

    The messages that were received after

    establishing a recovery point can be processed

    again in the event of a roll back to the recoverypoint.

    The messages received can be logged in two ways :

    pessimistic and optimistic .

    In pessimistic message logging, an incoming

    message is logged before it is processed. A

    drawback of this approach is that it slows down

    the underlying computation, even when there are

    no failures .

  • 8/2/2019 Unit v Recovery and Security Mechanism

    35/35

    In optimistic message logging, processors continue to

    perform the computation and the messages received

    are stored in volatile storage, which are logged atcertain intervals.

    A Scheme for Asynchronous Checkpointing and

    Recovery :

    1. The communication channels are reliable.2. The communication channels deliver the messages in

    the order they were sent.

    3. The communication channels are assumed to have

    infinite buffers.

    4. The message transmission delay is arbitrary, but .finite.

    5. The underlying computation is assumed to be event-

    d i