1 Unfortunately Nothing works perfectly all the time. Nothing works perfectly all the time. What...
-
date post
22-Dec-2015 -
Category
Documents
-
view
217 -
download
0
Transcript of 1 Unfortunately Nothing works perfectly all the time. Nothing works perfectly all the time. What...
11
UnfortunatelyUnfortunately
Nothing works perfectly all the time.Nothing works perfectly all the time.What will happen if:What will happen if:
Your hard-drive crashes.Your hard-drive crashes.Your last transaction made a failure.Your last transaction made a failure.Power shutdowns while you’re saving Power shutdowns while you’re saving
changes on your database.changes on your database.
Wiliwili…..Wiliwili…..
22
RecoveryRecoveryBy:By:
Sebbane MehdiSebbane Mehdi
Supervised by:Dr. H.HaddoutiSupervised by:Dr. H.Haddouti
Monday, April 14Monday, April 14thth 2003 2003
33
AgendaAgenda
Intro…Intro…Transactions Failures…Transactions Failures…System Failures…System Failures…Media Failures…Media Failures…Two-Phase Commit…Two-Phase Commit…ARIES Recovery Algorithm…ARIES Recovery Algorithm…Conclusion…Conclusion…References…References…
44
Intro…Intro…
Data warehouse are mission-critical.Data warehouse are mission-critical. Downtime can lead to IMPORTANT loss inDowntime can lead to IMPORTANT loss in
RevenueRevenue ProductivityProductivity Profitability &Profitability & Customers.Customers.
55
Intro… Intro… (more)(more)
The most frequent causes for data warehouse The most frequent causes for data warehouse
downtime are storage related:downtime are storage related: Component failure.Component failure.
Lengthy load times.Lengthy load times.
Lengthy backups.Lengthy backups.
Others:Others: User Errors.User Errors.
System failure.System failure.
66
Failures classificationFailures classification
Synchronous:Synchronous:
Trappable by the operating system.Trappable by the operating system.
No loss of data of any kind.No loss of data of any kind.
Possible causes: Program/logic errors: e.g. division by zeroPossible causes: Program/logic errors: e.g. division by zero
Asynchronous:Asynchronous:
System crash:System crash:
• Assume loss of all data on volatile storage.Assume loss of all data on volatile storage.
• Possible causes: power failure, OS error.Possible causes: power failure, OS error.
Media crash:Media crash:
• Loss of data on online and volatile storage.Loss of data on online and volatile storage.
• Possible causes: damage of storage media: human errors.Possible causes: damage of storage media: human errors.
77
What is Recovery???What is Recovery???
Recovery = Redundancy….Recovery = Redundancy….Simple example:Simple example:
Periodically, copy or dump the database to an Periodically, copy or dump the database to an archive storage.archive storage.
For every change, a log entry is made.For every change, a log entry is made. If failure:If failure:
A) Database damaged…A) Database damaged…B) Content unreliable…B) Content unreliable…
88
Why not Duplexing??Why not Duplexing??
Have two identical databases.Have two identical databases. Applying changes simultaneously.Applying changes simultaneously.
However,However, Twice as much storage.Twice as much storage. The 2 copies should be independent. To reduce the The 2 copies should be independent. To reduce the
chance that a single failure affects both copies.chance that a single failure affects both copies.
Very hard to achieve.Very hard to achieve.
99
Transaction Failures…Transaction Failures…
Intro…Intro…Transactions Failures…Transactions Failures…System Failures…System Failures…Media Failures…Media Failures…Two-Phase Commit…Two-Phase Commit…ARIES Recovery Algorithm…ARIES Recovery Algorithm…Conclusion…Conclusion…References…References…
1010
Transaction Failures…Transaction Failures… Intro…Intro… Transactions Failures…Transactions Failures…
Transactions.Transactions. Message handling.Message handling. Transaction structure.Transaction structure. Transaction failures.Transaction failures.
System Failures…System Failures… Media Failures…Media Failures… Two-Phase Commit…Two-Phase Commit… ARIES Recovery Algorithm…ARIES Recovery Algorithm… Conclusion…Conclusion… References…References…
1111
Transactions…Transactions…
The fundamental purpose of dbase system is to The fundamental purpose of dbase system is to carry out transactions.carry out transactions.
Transaction is:Transaction is: the smallest the smallest unit of workunit of work…… Atomic…Atomic…
BEGIN TRANSACTIONBEGIN TRANSACTION
recoverable operationsrecoverable operations
recoverable operationsrecoverable operations
……
……
COMMIT or ROLLBACKCOMMIT or ROLLBACK
1212
Transactions…Transactions… (more)(more)
A recoverable operations are:A recoverable operations are: All database updates (for which an entry has All database updates (for which an entry has
been loged) been loged) message I/O.message I/O.
Example:Example:
TRANSFERT $1000 3452332 TO 9087665TRANSFERT $1000 3452332 TO 9087665
TRANSFERT: PROC;TRANSFERT: PROC;GET (FROM, TO, AMOUNT);GET (FROM, TO, AMOUNT);FIND UNIQUE (ACCOUNT WHERE ACCOUNT = FROM);FIND UNIQUE (ACCOUNT WHERE ACCOUNT = FROM);ASSIGN (BALANCE – AMOUNT) TO BALANCE;ASSIGN (BALANCE – AMOUNT) TO BALANCE;IF BALANCE < 0IF BALANCE < 0THENTHEN
DO;DO;ROLLBACK;ROLLBACK;PUT (‘INSUFFICIENT FUNDS’);PUT (‘INSUFFICIENT FUNDS’);
ELSEELSEDO;DO;FIND UNIQUE (ACCOUNT WHERE ACCOUNT = TO);FIND UNIQUE (ACCOUNT WHERE ACCOUNT = TO);ASSIGN (BALANCE + AMOUNT) TO BALANCE;ASSIGN (BALANCE + AMOUNT) TO BALANCE;COMMIT;COMMIT;PUT (‘TRANSFER COMPLETE’);PUT (‘TRANSFER COMPLETE’);
END;END;END /* TRANFERT */;END /* TRANFERT */;
1313
Transactions…Transactions… (more)(more)
To the user: To the user: “transfer x dollars from account A to account B”“transfer x dollars from account A to account B” is a single operation.is a single operation.
Either succeed of fail.Either succeed of fail.Succeed: well & goodSucceed: well & goodFail: nothing should have changed in the Fail: nothing should have changed in the
database.database.
What about messages??????What about messages??????
1414
Message handlingMessage handling
In the TRANSFER example:In the TRANSFER example: the transaction not only updates the database,the transaction not only updates the database, it also sends messages to the end users:it also sends messages to the end users:
INSUFFICIENT FUNDS.INSUFFICIENT FUNDS. TRANSFER COMPLETE.TRANSFER COMPLETE.
Handling messages is done by the Handling messages is done by the Data Data Communication ManagerCommunication Manager..
Note: output messages should not be transmitted Note: output messages should not be transmitted until the planned end-of-transaction…until the planned end-of-transaction…
WHY?????WHY?????
TRANSFERT: PROC;TRANSFERT: PROC;GET (FROM, TO, AMOUNT);GET (FROM, TO, AMOUNT);FIND UNIQUE (ACCOUNT WHERE ACCOUNT = FROM);FIND UNIQUE (ACCOUNT WHERE ACCOUNT = FROM);ASSIGN (BALANCE – AMOUNT) TO BALANCE;ASSIGN (BALANCE – AMOUNT) TO BALANCE;IF BALANCE < 0IF BALANCE < 0THENTHEN
DO;DO;ROLLBACK;ROLLBACK;PUT (‘INSUFFICIENT FUNDS’);PUT (‘INSUFFICIENT FUNDS’);
ELSEELSEDO;DO;FIND UNIQUE (ACCOUNT WHERE ACCOUNT = TO);FIND UNIQUE (ACCOUNT WHERE ACCOUNT = TO);ASSIGN (BALANCE + AMOUNT) TO BALANCE;ASSIGN (BALANCE + AMOUNT) TO BALANCE;PUT (‘TRANSFER COMPLETE’);PUT (‘TRANSFER COMPLETE’);
COMMIT;COMMIT;
END;END;END /* TRANFERT */;END /* TRANFERT */;
FAILURE
1515
DC ManagerDC Manager
DC Manager that receives the original input DC Manager that receives the original input messagemessage: : (giving FROM, TO and AMOUNT)(giving FROM, TO and AMOUNT)
a.a. writes a log record, andwrites a log record, and b.b. places the message on the input queueplaces the message on the input queue
GET: GET: retrieve a message from the input queueretrieve a message from the input queue..
PUTPUT: : put a message in the output queueput a message in the output queue..
COMMIT & ROLLBACK affects also COMMIT & ROLLBACK affects also messagesmessages..
1616
COMMIT & ROLLBACKCOMMIT & ROLLBACK
They cause the DC Manager to:They cause the DC Manager to: Write a log entry for the messages on the Write a log entry for the messages on the
output queue.output queue. Arrange for the actual transmission of those Arrange for the actual transmission of those
messages.messages. Remove messages from the input queue.Remove messages from the input queue.
Note: Note: A transaction failure such as overflow causes the A transaction failure such as overflow causes the DC Manager to cancel the output messagesDC Manager to cancel the output messages
1717
Transaction structureTransaction structure
As we can see from the TRANSFER examlpe:As we can see from the TRANSFER examlpe: Accept input message; Accept input message; Perform database processing;Perform database processing; Send output message(s);Send output message(s);
Too simple.Too simple. What about complex structures with multiple What about complex structures with multiple
communications???communications???
1818
Complex structuresComplex structures
Two ways to deal with multiple Two ways to deal with multiple communications:communications:• Subdivide them into a sequence of simple Subdivide them into a sequence of simple
transactions.transactions.• the database may be changed in the interval the database may be changed in the interval
between two “conversations”between two “conversations”
• treated as one big transactiontreated as one big transaction..• at any time the end-user must be prepared to get a at any time the end-user must be prepared to get a
message like “ignore all previous messages, a message like “ignore all previous messages, a failure occurred”failure occurred”..
1919
Transaction failures.Transaction failures. Transaction local failures that are detected by Transaction local failures that are detected by
the application code itself (INSUFFISANT the application code itself (INSUFFISANT FUNDS).FUNDS).
Transaction local failures that are not explicitly Transaction local failures that are not explicitly handled by the application code (arithmetic handled by the application code (arithmetic overflow). overflow).
System-wide failures (CPU failure) that affect all System-wide failures (CPU failure) that affect all transactions currently in progress but do not transactions currently in progress but do not damage the database.damage the database.
Media failures (disk head failure) that damage Media failures (disk head failure) that damage the database, or some portion of it, and affect all the database, or some portion of it, and affect all transactions currently using that portion.transactions currently using that portion.
2020
Why???Why???
Conditions that may cause such Conditions that may cause such terminations includeterminations include::
arithmetic operation overflow,arithmetic operation overflow, division by zero, and division by zero, and storage protection violationstorage protection violation..
Transaction failure means that the Transaction failure means that the program did not reach its planned program did not reach its planned terminationtermination. . ROLLBACKROLLBACK
2121
How???How???
undo all changes the transaction made on undo all changes the transaction made on the database and cancel all output the database and cancel all output messages.messages. three basic types of changes:three basic types of changes:
updating an existing record,updating an existing record, deleting an existing record, ordeleting an existing record, or inserting a new record. inserting a new record.
For more convenience, the log file should be For more convenience, the log file should be kept on a direct access device….. kept on a direct access device…..
However….However….
2222
What about if…What about if…
A failuer happens during rolling back…A failuer happens during rolling back…UNDO must be UNDO must be idempotent idempotent ::
UNDO(UNDO(UNDO(. . . (x))))= UNDO(x) for UNDO(UNDO(UNDO(. . . (x))))= UNDO(x) for all x.all x.
As a transaction is a As a transaction is a unit of workunit of work… It is also a … It is also a unit of receoveryunit of receovery..
2323
System Failures…System Failures…
Intro…Intro…Transactions Failures…Transactions Failures…System Failures…System Failures…Media Failures…Media Failures…Two-Phase Commit…Two-Phase Commit…ARIES Recovery Algorithm…ARIES Recovery Algorithm…Conclusion…Conclusion…References…References…
2424
System Failures…System Failures… Intro…Intro… Transactions Failures…Transactions Failures… System Failures…System Failures…
Checkpoints.Checkpoints. Types of transactions.Types of transactions. REDO.REDO. Write-Ahead-Log.Write-Ahead-Log. System startup.System startup.
Media Failures…Media Failures… Two-Phase Commit…Two-Phase Commit… ARIES Recovery Algorithm…ARIES Recovery Algorithm… Conclusion…Conclusion… References…References…
2525
System Failures…System Failures…
we mean by this any event that cause the we mean by this any event that cause the system to stop and thus require a system system to stop and thus require a system restartrestart..
How the recovery manager knows at How the recovery manager knows at restart which transactions to rollback??restart which transactions to rollback??
checkpointscheckpoints
2626
CheckpointsCheckpoints
reduce the search time drasticallyreduce the search time drastically.. very straightforwardvery straightforward: : Periodically, the system Periodically, the system
“takes a checkpoint”:“takes a checkpoint”:1.1. Force-writing any log records that are still in main Force-writing any log records that are still in main
storage out to the actual log; storage out to the actual log;
2.2. Forcing a “checkpoint record” out to the log data set;Forcing a “checkpoint record” out to the log data set;
3.3. Force-writing any updates that are still in main storage Force-writing any updates that are still in main storage out to the actual database;out to the actual database;
4.4. Writing the address of the checkpoint record within the Writing the address of the checkpoint record within the log data set into a “restart file”.log data set into a “restart file”.
2727
CheckpointsCheckpoints
Each checkpoint record contains:Each checkpoint record contains: A list of all transaction active at the time of the A list of all transaction active at the time of the
checkpoint; together withcheckpoint; together with The address within the log of each such The address within the log of each such
transaction’s most recent log record.transaction’s most recent log record.
At restart time,At restart time, t the manager needs then to check he manager needs then to check which transactions need to be undone, and which transactions need to be undone, and which should be redone.which should be redone.
2828
Types of transactionsTypes of transactions
Consider the Consider the following:following: A system failure A system failure
has occurred at has occurred at time t2. time t2.
The most recent The most recent checkpoint prior checkpoint prior to time t2 was to time t2 was taken at time t1. taken at time t1.
T1
T5
T3
T2
t
1
t
2Checkpoint System
crash
T4
2929
Types…Types… Transactions of type Transactions of type
T1 were complete T1 were complete before time t1.before time t1.
Transactions of type Transactions of type T2 started after time T2 started after time t1 and completed t1 and completed before time t2. before time t2.
Transactions of type Transactions of type T3 started prior to T3 started prior to time t1 and time t1 and completed after t1 completed after t1 and before time t2. and before time t2.
T1
T5
T3
T2
t
1
t
2Checkpoint System
crash
T4
3030
Types…Types…
Transactions of type Transactions of type T4 started prior to T4 started prior to time t1 but did not time t1 but did not complete by time t2. complete by time t2.
Finally, transactions Finally, transactions of type T5 started of type T5 started after time t1 but did after time t1 but did not complete by time not complete by time t2. t2.
T1
T5
T3
T2
t
1
t
2Checkpoint System
crash
T4
3131
Types….Types….
What will What will happen at happen at restart????restart????
Undo T4 & T5Undo T4 & T5
But also redo But also redo T2 & T3T2 & T3
T1
T5
T3
T2
t
1
t
2Checkpoint System
crash
T4
3232
REDOREDO
Recovery manager able to track the log Recovery manager able to track the log and invoke REDO for appropriate and invoke REDO for appropriate transactions.transactions.
Idempotent.Idempotent.handling messages:handling messages:
reschedule transactions of type T2 and T3.reschedule transactions of type T2 and T3.
force-write input messages log records.force-write input messages log records.
3333
Write-Ahead Log ProtocoWrite-Ahead Log Protocoll
Up to now,Up to now,Changing the database.Changing the database.Writing the log record.Writing the log record.
Two separate operations….Two separate operations….What will happen after a failure occurring What will happen after a failure occurring
in the interval between the two. in the interval between the two.
Write-Ahead-Log…Write-Ahead-Log…
3434
Write-Ahead Log ProtocoWrite-Ahead Log Protocoll for safety, the log record should always be for safety, the log record should always be
written first. written first.
A transaction is not allowed to write a record to A transaction is not allowed to write a record to the physical database until at least the undo the physical database until at least the undo portion of the corresponding log record has been portion of the corresponding log record has been written to the physical log. written to the physical log.
A transaction is not allowed to complete A transaction is not allowed to complete COMMIT processing until both the redo and the COMMIT processing until both the redo and the undo portions of all log records for the undo portions of all log records for the transaction have been written to the physical log. transaction have been written to the physical log.
3535
System startupSystem startup
How does a How does a system react to system react to failures…failures…
3 types: 3 types: Emergency restart.Emergency restart.
the process that is the process that is invoked after a invoked after a system failure has system failure has occurred. It involves occurred. It involves the recovery the recovery procedures (UNDO or procedures (UNDO or REDO).REDO).
3636
System startupSystem startup
How does a How does a system react to system react to failures…failures…
3 types: 3 types: Emergency restart.Emergency restart.warm start.warm start.
the process of starting the process of starting up the system after a up the system after a controlled system controlled system shutdown. shutdown.
On a receipt of a On a receipt of a SHUTDOWN SHUTDOWN command…command…
3737
System startupSystem startup
How does a How does a system react to system react to failures…failures…
3 types: 3 types: Emergency restart.Emergency restart.warm start.warm start.Cold start.Cold start.
Starting the system Starting the system from scratch from scratch
the process of starting the process of starting the system after some the system after some disastrous failure that disastrous failure that makes warm start makes warm start impossible impossible
involves starting involves starting again from some again from some archive version of the archive version of the database. database.
3838
MediaMedia Failures… Failures…
Intro…Intro…Transactions Failures…Transactions Failures…System Failures…System Failures…Media Failures…Media Failures…Two-Phase Commit…Two-Phase Commit…ARIES Recovery Algorithm…ARIES Recovery Algorithm…Conclusion…Conclusion…References…References…
3939
Media failures…Media failures…
A media failure is a failure in which a A media failure is a failure in which a portion of the secondary storage medium portion of the secondary storage medium is damaged. is damaged.
The recovery process consists of:The recovery process consists of: restoring the database from an archive dump, restoring the database from an archive dump,
then then use the log to redo transactions run since that use the log to redo transactions run since that
dump was taken. dump was taken.
4040
Media failures…Media failures…
A Media failure occurred….A Media failure occurred…. All current transactions will be abnormally All current transactions will be abnormally
terminated. terminated. New device should be allocated to replace New device should be allocated to replace
the one that failed. the one that failed. A utility program is then run which:A utility program is then run which:
a.a. load the database on to the new device from the load the database on to the new device from the most recent archive dump, and most recent archive dump, and
b.b. use the log to redo all the transactions that use the log to redo all the transactions that completed since the dump was taken. completed since the dump was taken.
4141
Two-Phase Commit…Two-Phase Commit…
Intro…Intro…Transactions Failures…Transactions Failures…System Failures…System Failures…Media Failure…Media Failure…Two-Phase Commit…Two-Phase Commit…ARIES Recovery Algorithm…ARIES Recovery Algorithm…Conclusion…Conclusion…References…References…
4242
Two-Phase commitTwo-Phase commit
Required whenever a transaction is able to Required whenever a transaction is able to invoke multiple independent resource invoke multiple independent resource managers. managers.
No separate COMMITs.No separate COMMITs.Transaction issues a single “global” Transaction issues a single “global”
COMMIT to the COMMIT to the coordinatorcoordinator..
Coordinator goes in the two phases:Coordinator goes in the two phases:
4343
Two-Phase commitTwo-Phase commit
Phase I:Phase I: request all resource managers to get them selves request all resource managers to get them selves
into a valid state (commit or rollback)into a valid state (commit or rollback) If the resource manager succeeds reaching this If the resource manager succeeds reaching this
state, it replies “OK” to the coordinator. state, it replies “OK” to the coordinator. Phase II:Phase II:
If all replies are “OK”, broadcasts the command If all replies are “OK”, broadcasts the command “COMMIT” to all.“COMMIT” to all.
Otherwise, broadcasts the command “ROLLBACK”Otherwise, broadcasts the command “ROLLBACK”
4444
ARIES Recovery Algorithm…ARIES Recovery Algorithm…
Intro…Intro…Transactions Failures…Transactions Failures…System Failures…System Failures…Media Failure…Media Failure…Two-Phase Commit…Two-Phase Commit…ARIES Recovery Algorithm…ARIES Recovery Algorithm…Conclusion…Conclusion…References…References…
4545
reminderreminder
Up to now,Up to now,Transaction failures.Transaction failures.System failures.System failures.Media failures.Media failures.WAL,2PC, and checkpointWAL,2PC, and checkpoint
All this is good, but how to combine it to All this is good, but how to combine it to have a better recovery……have a better recovery……
4646
ARIES Recovery Algorithm…ARIES Recovery Algorithm… Intro…Intro… Transactions Failures…Transactions Failures… System Failures…System Failures… Media Failure…Media Failure… Two-Phase Commit…Two-Phase Commit… ARIES Recovery Algorithm…ARIES Recovery Algorithm…
The concept.The concept. Data structure used in ARIES.Data structure used in ARIES. ARIES in details.ARIES in details. ARIES features.ARIES features. Why ARIES.Why ARIES. ARIES optimizations.ARIES optimizations.
Conclusion…Conclusion… References…References…
4747
ARIES Recovery AlgorithmARIES Recovery Algorithm
The dominant crash recovery algorithm in The dominant crash recovery algorithm in commercial DBMSs.commercial DBMSs.
Based on three concepts:Based on three concepts: Write-Ahead logging;Write-Ahead logging; Repeating history during REDO.Repeating history during REDO.
Retrace all actions of the DBMS prior the crash to Retrace all actions of the DBMS prior the crash to reconstruct the database state when crash occurred.reconstruct the database state when crash occurred.
Logging changes during UNDO.Logging changes during UNDO. Prevents ARIES from repeating complete undo operations Prevents ARIES from repeating complete undo operations
when a failure occurred during recovery.when a failure occurred during recovery.
4848
ARIES Recovery AlgorithmARIES Recovery Algorithm
Step 1: analysisStep 1: analysis Identify updated pages in the buffer.
Identify active transactions when the crash occurred.
Identify the point in the log where redo should start.
Step 2: REDOStep 2: REDO Redo operations are applied until he end of the log.
Include writes from uncommitted transactions.
Only necessary redo operations are applied.
4949
ARIES Recovery AlgorithmARIES Recovery Algorithm
Step 3:UNDOStep 3:UNDOLog is scanned backward.Updates from active transactions are undone.
5050
Data Structures Used in ARIESData Structures Used in ARIES
5151
ARIES Data StructuresARIES Data StructuresLog sequence number (LSN) identifies each Log sequence number (LSN) identifies each
log recordlog recordMust be sequentially increasingMust be sequentially increasingTypically an offset from beginning of log file to Typically an offset from beginning of log file to
allow fast accessallow fast accessEasily extended to handle multiple log filesEasily extended to handle multiple log files
5252
ARIES Data StructuresARIES Data Structures
Each page contains a Each page contains a PageLSNPageLSN which is which is the LSN of the last log record whose the LSN of the last log record whose effects are reflected on the pageeffects are reflected on the pageTo update a page:To update a page:
Lock the page, and write the log record Lock the page, and write the log record Update the pageUpdate the pageRecord the LSN of the log record in PageLSNRecord the LSN of the log record in PageLSNUnlock pageUnlock page
PageLSN is used during recovery to prevent PageLSN is used during recovery to prevent repeated redo.repeated redo.Thus ensuring idempotenceThus ensuring idempotence
5353
ARIES Data StructuresARIES Data StructuresEach log record contains LSN of previous Each log record contains LSN of previous
log record of the same transactionlog record of the same transaction
Special redo-only log record called Special redo-only log record called compensation log record (CLR) compensation log record (CLR) used to used to log actions taken during recovery that log actions taken during recovery that never need to be undonenever need to be undoneHave a field UndoNextLSN to note next Have a field UndoNextLSN to note next
(earlier) record to be undone(earlier) record to be undoneRequired to avoid repeated undo of already Required to avoid repeated undo of already
undone actionsundone actions
LSN TransId PrevLSN RedoInfo UndoInfo
LSN TransID UndoNextLSN RedoInfo
5454
ARIES Data StructuresARIES Data StructuresDirtyPageTableDirtyPageTable
List of pages in the buffer that have been List of pages in the buffer that have been updatedupdated
Contains, for each such pageContains, for each such pagePageLSNPageLSN of the page of the pageRecLSN RecLSN is an LSN such that log records before this is an LSN such that log records before this
LSN have already been applied to the page version LSN have already been applied to the page version on diskon disk Set to current end of log when a page is inserted into dirty Set to current end of log when a page is inserted into dirty
page table (just before being updated)page table (just before being updated) Recorded in checkpoints, helps to minimize redo workRecorded in checkpoints, helps to minimize redo work
5555
ARIES Data StructuresARIES Data Structures
Checkpoint log recordCheckpoint log recordContains: Contains:
DirtyPageTable and list of active transactionsDirtyPageTable and list of active transactionsFor each active transaction, LastLSN, the LSN of For each active transaction, LastLSN, the LSN of
the last log record written by the transactionthe last log record written by the transaction
Fixed position on disk notes LSN of last Fixed position on disk notes LSN of last completed checkpoint log recordcompleted checkpoint log record
5656
ARIES In Details…ARIES In Details…
5757
ARIES Recovery: AnalysisARIES Recovery: AnalysisStarts from last complete checkpoint log Starts from last complete checkpoint log
recordrecordReads in DirtyPageTable from log recordReads in DirtyPageTable from log recordSets RedoLSN = min of RecLSNs of all pages in Sets RedoLSN = min of RecLSNs of all pages in
DirtyPageTableDirtyPageTableIn case no pages are dirty, RedoLSN = checkpoint In case no pages are dirty, RedoLSN = checkpoint
record’s LSNrecord’s LSNSets undo-list = list of transactions in checkpoint Sets undo-list = list of transactions in checkpoint
log recordlog recordReads LSN of last log record for each Reads LSN of last log record for each
transaction in undo-list from checkpoint log transaction in undo-list from checkpoint log recordrecord
5858
ARIES Recovery: AnalysisARIES Recovery: Analysis
Scans forward from checkpointScans forward from checkpoint If any log record found for transaction not in If any log record found for transaction not in
undo-list, adds transaction to undo-listundo-list, adds transaction to undo-listWhenever an update log record is foundWhenever an update log record is found
If page is not in DirtyPageTable, it is added with If page is not in DirtyPageTable, it is added with RecLSN set to LSN of the update log recordRecLSN set to LSN of the update log record
If transaction end log record found, delete If transaction end log record found, delete transaction from undo-listtransaction from undo-list
Keeps track of last log record for each Keeps track of last log record for each transaction in undo-listtransaction in undo-listMay be needed for later undoMay be needed for later undo
5959
ARIES Recovery: AnalysisARIES Recovery: Analysis
At end of analysis pass:At end of analysis pass:RedoLSN determines where to start redo RedoLSN determines where to start redo
passpassRecLSN for each page in DirtyPageTable RecLSN for each page in DirtyPageTable
used to minimize redo workused to minimize redo workAll transactions in undo-list need to be rolled All transactions in undo-list need to be rolled
backback
6060
Our logOur log
6161
Our tablesOur tables
CRASH
6262
After the crashAfter the crash
Transaction tableTransaction table dirty page table dirty page tableTransID LastLSN Status TransID LastLSN Status PageID LSN PageID LSN
T1T1 33 commit commit C C 1 1T2T2 22 in progress in progress B 2 B 2
After analysisAfter analysisTransaction tableTransaction table dirty page table dirty page table
TransID LastLSN StatusTransID LastLSN Status PageID LSN PageID LSNT1T1 33 commit commit CC 1 1T2T2 88 commit commit BB 2 2T3T3 66 in progress A in progress A 6 6
6363
ARIES Redo PassARIES Redo PassRedo Pass: Repeats history by replaying every action Redo Pass: Repeats history by replaying every action
not already reflected in the page on disk, as follows:not already reflected in the page on disk, as follows: Scans forward from RedoLSN. Whenever an update Scans forward from RedoLSN. Whenever an update
log record is found:log record is found:1.1. If the page is not in DirtyPageTable or the LSN of the log If the page is not in DirtyPageTable or the LSN of the log
record is less than the RecLSN of the page in record is less than the RecLSN of the page in DirtyPageTable, then skip the log recordDirtyPageTable, then skip the log record
2.2. Otherwise fetch the page from disk. If the PageLSN of Otherwise fetch the page from disk. If the PageLSN of the page fetched from disk is less than the LSN of the log the page fetched from disk is less than the LSN of the log record, redo the log recordrecord, redo the log record
NOTE: if either test is negative the effects of the log record NOTE: if either test is negative the effects of the log record have already appeared on the page. First test avoids have already appeared on the page. First test avoids even fetching the page from disk!even fetching the page from disk!
6464
ARIES: Undo PassARIES: Undo PassPerforms backward scan on log undoing all Performs backward scan on log undoing all
transaction in undo-listtransaction in undo-list Backward scan optimized by skipping unneeded Backward scan optimized by skipping unneeded
log records as follows:log records as follows: Next LSN to be undone for each transaction set to Next LSN to be undone for each transaction set to
LSN of last log record for transaction found by LSN of last log record for transaction found by analysis pass.analysis pass.
At each step pick largest of these LSNs to undo, At each step pick largest of these LSNs to undo, skip back to it and undo it skip back to it and undo it
6565
ARIES: Undo PassARIES: Undo Pass
After undoing a log recordAfter undoing a log recordFor ordinary log records, set next LSN to For ordinary log records, set next LSN to
be undone for transaction to PrevLSN be undone for transaction to PrevLSN noted in the log recordnoted in the log record
For compensation log records (CLRs) set For compensation log records (CLRs) set next LSN to be undo to UndoNextLSN next LSN to be undo to UndoNextLSN noted in the log recordnoted in the log record
All intervening records are skipped since All intervening records are skipped since they would have been undo alreadythey would have been undo already
6666
ARIES FeaturesARIES FeaturesRecovery IndependenceRecovery Independence
Pages can be recovered independently of othersPages can be recovered independently of othersE.g. if some disk pages fail they can be recovered from a E.g. if some disk pages fail they can be recovered from a
backup while other pages are being usedbackup while other pages are being used
Savepoints:Savepoints:Transactions can record savepoints and roll back to Transactions can record savepoints and roll back to
a savepointa savepointUsed to rollback just enough to release locks on deadlockUsed to rollback just enough to release locks on deadlock
6767
ARIES FeaturesARIES FeaturesRecovery optimizations: For example:Recovery optimizations: For example:
Dirty page table can be used to prefetch pages Dirty page table can be used to prefetch pages during redoduring redo
Out of order redo is possible:Out of order redo is possible: redo can be postponed on a page being fetched redo can be postponed on a page being fetched
from disk, and performed when page is fetched. from disk, and performed when page is fetched. Meanwhile other log records can continue to be Meanwhile other log records can continue to be
processedprocessed
6868
MORE for ARIESMORE for ARIES
Simple.Simple. Incorporates numerous optimizations to reduce Incorporates numerous optimizations to reduce
overheads during normal processing and to speed up overheads during normal processing and to speed up recovery.recovery.
ARIES hasARIES has1.1. log sequence number (LSN)log sequence number (LSN) to identify log records to identify log records
Stores LSNs in pages to identify what updates have already been Stores LSNs in pages to identify what updates have already been applied to a database page.applied to a database page.
2.2. Physiological redoPhysiological redo
3.3. Dirty page table to avoid unnecessary redos during recoveryDirty page table to avoid unnecessary redos during recovery
4.4. Fuzzy checkpointing that only records information about dirty Fuzzy checkpointing that only records information about dirty pages, and does not require dirty pages to be written out at pages, and does not require dirty pages to be written out at checkpoint timecheckpoint time
6969
ARIES OptimizationsARIES Optimizations Physiological redoPhysiological redo
Affected page is physically identified, action Affected page is physically identified, action within page can be logical.within page can be logical.
Used to reduce logging overheadsUsed to reduce logging overheads e.g. when a record is deleted and all other records have to e.g. when a record is deleted and all other records have to
be moved to fill holebe moved to fill hole Physiological redo can log just the record deletionPhysiological redo can log just the record deletion
Why caring about the process of filling the newly Why caring about the process of filling the newly created hole.created hole.
7070
ARIES OptimizationsARIES Optimizations
Fuzzy checkpointingFuzzy checkpointing is done as follows: is done as follows:1.1. Temporarily stop all updates by transactionsTemporarily stop all updates by transactions2.2. Write a <Write a <checkpointcheckpoint LL> log record and force > log record and force
log to stable storagelog to stable storage3.3. Note list Note list MM of modified pages of modified pages4.4. Now permit transactions to proceed with their Now permit transactions to proceed with their
actionsactions5.5. Save on disk all modified pages in list Save on disk all modified pages in list MM6.6. Store a pointer to the Store a pointer to the checkpointcheckpoint record in a record in a
fixed position fixed position lastlast__checkpointcheckpoint on disk on disk
7171
ARIES OptimizationsARIES Optimizations
When recovering using a fuzzy checkpoint, When recovering using a fuzzy checkpoint, start scan from the start scan from the checkpointcheckpoint record record pointed to by pointed to by last last__checkpointcheckpointLog records before Log records before last last__checkpointcheckpoint have their have their
updates reflected in database on disk, and updates reflected in database on disk, and need not be redone.need not be redone.
Incomplete checkpoints, where system had Incomplete checkpoints, where system had crashed while performing crashed while performing checkpoint, are handled safelycheckpoint, are handled safely
7272
ARIES familyARIES family
Aries for shared diskAries for shared disk Aries for semi-structured data.Aries for semi-structured data. ARIES/CSA.ARIES/CSA. ARIES/RRH.ARIES/RRH. ARIES/NT.ARIES/NT. ARIES/KVL.ARIES/KVL. ARIES/IM.ARIES/IM. ARIES/LHS.ARIES/LHS.
7373
ConclusionConclusion
Implementing the right recovery procedure Implementing the right recovery procedure is as important as benchmarking your is as important as benchmarking your warehouse product performance.warehouse product performance.
Big giants SQL, DB2 … have their own Big giants SQL, DB2 … have their own recovery technique specific to them.recovery technique specific to them.
They have their own recovery products.They have their own recovery products.
Just for DB2Just for DB2
7474
Figure 1
7575
ConclusionConclusion If you wantIf you want account for
catastrophic failures: Backup entire Backup entire
database.database. Backup system log Backup system log
more frequently.more frequently.
That may be costly…..That may be costly…..
7676
ConclusionConclusion
Recovery procedure should be relative to Recovery procedure should be relative to the:the:Amount of data to protect.Amount of data to protect.The importance of this data.The importance of this data.
7777
Thank youThank you
Q&AQ&A
7878
References…References… Date Book: Chapter 1 recovery…Date Book: Chapter 1 recovery… Database Management SystemsDatabase Management Systemshttp://discovery.csc.ncsu.edu/~pning/Courses/csc742-Spring-02/T15_Recovery_6.pdfhttp://discovery.csc.ncsu.edu/~pning/Courses/csc742-Spring-02/T15_Recovery_6.pdf
ARIES Recovery Algorithm:ARIES Recovery Algorithm: Recovery with aries…Recovery with aries…http://www-2.cs.cmu.edu/afs/cs/academic/class/15721-f01/www/lectures/recovery_with_aries.pdfhttp://www-2.cs.cmu.edu/afs/cs/academic/class/15721-f01/www/lectures/recovery_with_aries.pdf
Figure 1Figure 1http://www.bmc.com/products/database/resourcecenter/db2backup_recovery.pdfhttp://www.bmc.com/products/database/resourcecenter/db2backup_recovery.pdf
Figure 2Figure 2http://www2.cs.cmu.edu/afs/cs/academic/class/15721f01/www/lectures/rechttp://www2.cs.cmu.edu/afs/cs/academic/class/15721f01/www/lectures/recovery_with_aries.pdfovery_with_aries.pdf