EMC MULTISITE DISASTER RECOVERY FOR MICROSOFT SQL SERVER 2012
SQL Server Disaster Recovery 101
description
Transcript of SQL Server Disaster Recovery 101
![Page 2: SQL Server Disaster Recovery 101](https://reader035.fdocuments.net/reader035/viewer/2022062323/568161b9550346895dd18a65/html5/thumbnails/2.jpg)
Transaction log• Every data modification is first written to
tlog• Before the transaction commits • Delayed durability in 2014!
• Data pages are updated in memory • CHECKPOINT writes dirty pages to disk
• Both committed and uncommitted • Frequency depends on recovery interval
• When TX rolls back, state of the data prior to update is read from tlog
• Space in transaction log is being freed and reused • When exactly this happens, depends on recovery model used
![Page 3: SQL Server Disaster Recovery 101](https://reader035.fdocuments.net/reader035/viewer/2022062323/568161b9550346895dd18a65/html5/thumbnails/3.jpg)
Transaction log and recovery • If SQL Server service shuts down, recovery will
happen on the next startup • These can happen: • Dirty pages from uncommitted tx not written to disk • Dirty pages from committed tx written to disk • Dirty pages from uncommitted tx written to disk
• Have to be rolled back (undo) • Dirty pages from committed tx not written to disk
• Have to be rolled forward (redo) • Without the tlog, transactional consistency of the
database cannot be guaranteed
![Page 4: SQL Server Disaster Recovery 101](https://reader035.fdocuments.net/reader035/viewer/2022062323/568161b9550346895dd18a65/html5/thumbnails/4.jpg)
Recovery models • Three recovery models in SQL Server
• SIMPLE recovery model • FULL • BULK-LOGGED
• Model database is in FULL recovery model • If you want newly created databases be in SIMPLE, change
recovery model for model database
![Page 5: SQL Server Disaster Recovery 101](https://reader035.fdocuments.net/reader035/viewer/2022062323/568161b9550346895dd18a65/html5/thumbnails/5.jpg)
SIMPLE recovery model• Full and differential backups only • Transaction log is not backed up • Space in tlog is freed when the data is
committed and checkpointed to disk • Eliminates the need for managing tlog
space • Restore only to the most recent backup • Consider not using SIMPLE model on
production system
![Page 6: SQL Server Disaster Recovery 101](https://reader035.fdocuments.net/reader035/viewer/2022062323/568161b9550346895dd18a65/html5/thumbnails/6.jpg)
FULL recovery model • Full, differential, and log backups • Log backups are required • Space in tlog is freed when the data is
committed and checkpointed to disk and the log is backed up
• Provides full recoverability • Up to the point of failure • Point in time restore
![Page 7: SQL Server Disaster Recovery 101](https://reader035.fdocuments.net/reader035/viewer/2022062323/568161b9550346895dd18a65/html5/thumbnails/7.jpg)
BULK-LOGGED recovery model • Same as the FULL model, but:
• Bulk operation are only minimally logged • In general, only allocation and deallocation of extents is
logged • Increased performance and lower log
space consumption • Use FULL, but switch to BULK-LOGGED
for bulk operations • Bulk insert and bcp • SELECT INTO • ALTER INDEX … REBUILD • etc
![Page 8: SQL Server Disaster Recovery 101](https://reader035.fdocuments.net/reader035/viewer/2022062323/568161b9550346895dd18a65/html5/thumbnails/8.jpg)
Backup and restore • Full, differential, and transaction log
backup • Log backup in full and bulk-logged model
• Backup strategy • How much data I am permitted to lose? • How much time I am permitted to be down?
• Use differential backups to minimize the restore time
• Check your backups • If you can’t restore from backup..
• Use WITH CHECKSUM option • Backup chain
• Use COPY_ONLY
![Page 9: SQL Server Disaster Recovery 101](https://reader035.fdocuments.net/reader035/viewer/2022062323/568161b9550346895dd18a65/html5/thumbnails/9.jpg)
Full Backups• Creates starting point for all backups• Contains all used pages in db• Can run concurrently to transactions
• ~2% processor overhead• Completely non-blocking backup
• Transaction log• Database growth/shrink
9
![Page 10: SQL Server Disaster Recovery 101](https://reader035.fdocuments.net/reader035/viewer/2022062323/568161b9550346895dd18a65/html5/thumbnails/10.jpg)
Full Backups - Operation• Checkpoint• Mark log• Dump all used pages• Mark log• Copy portion of log between marks
10
Transaction LogMark log
Data changesMark log
![Page 11: SQL Server Disaster Recovery 101](https://reader035.fdocuments.net/reader035/viewer/2022062323/568161b9550346895dd18a65/html5/thumbnails/11.jpg)
Differential Backups• Contains only pages changed since
last full backup• Differential bitmap
• NOT an incremental backup
11
0001001110001001000101100
Bitmap
![Page 12: SQL Server Disaster Recovery 101](https://reader035.fdocuments.net/reader035/viewer/2022062323/568161b9550346895dd18a65/html5/thumbnails/12.jpg)
Transaction Log• Backup the inactive portion of log
• Committed transaction• Clears backed up portion on
completion• Only allowed in full recovery or
bulk_logged mode
12
![Page 13: SQL Server Disaster Recovery 101](https://reader035.fdocuments.net/reader035/viewer/2022062323/568161b9550346895dd18a65/html5/thumbnails/13.jpg)
Partial Backups• Partial database
• Primary filegroup• Read/write filegroups
• Partial differential• Primary filegroup differential• Read/write filegroups differential
13
![Page 14: SQL Server Disaster Recovery 101](https://reader035.fdocuments.net/reader035/viewer/2022062323/568161b9550346895dd18a65/html5/thumbnails/14.jpg)
2014 backup and restore• Backup on-premise database to cloudBACKUP DATABASE AdventureWorks2012 TO URL = 'https://mystorageaccount.blob.core.windows.net/mycontainer/AdventureWorks2012.bak' WITH CREDENTIAL = 'mycredential'
• Supports:• DATABASE, LOG, DIFFERENTIAL, COPY_ONLY, PARTIAL,
COMPRESSION, CHECKSUM• Restore from cloud backup
• SQL Server Managed Backup to Windows Azure• smart_admin
• Backup encryption
![Page 15: SQL Server Disaster Recovery 101](https://reader035.fdocuments.net/reader035/viewer/2022062323/568161b9550346895dd18a65/html5/thumbnails/15.jpg)
I/O errors • Three types
• 823 (a hard I/O error) • Page can’t be read from disk
• 824 (a soft I/O error) • Page was read, but is torn
(inconsistent) • 825 (a read-retry error)
• Page was read, but after retries • Logged in msdb..suspect_pages • Input into single-page restore operations
![Page 16: SQL Server Disaster Recovery 101](https://reader035.fdocuments.net/reader035/viewer/2022062323/568161b9550346895dd18a65/html5/thumbnails/16.jpg)
PAGE_VERIFY • Page is composed of 16 512-byte
sectors • It is possible to be not written entirely • Allows fast detection of problems • Three options
• NONE • TORN_PAGE_DETECTION • Writes two control bits in each sector
• CHECKSUM • Calculates and writes a checksum of data on the page • Written last, checked first
• Failures cause 824 error
![Page 17: SQL Server Disaster Recovery 101](https://reader035.fdocuments.net/reader035/viewer/2022062323/568161b9550346895dd18a65/html5/thumbnails/17.jpg)
DBCC CHECKDB • Checks the logical and physical integrity
of all the objects in the specified database • DBCC CHECKDB db_name WITH ALL_ERRORMSGS, NO_INFOMSGS
• How often? It depends. • DBCC CHECKDB could take a long time to
run • Depends on
• Size of database • Load • Number and type of corruptions
• Run DBCC CHECKDB on restored database
![Page 18: SQL Server Disaster Recovery 101](https://reader035.fdocuments.net/reader035/viewer/2022062323/568161b9550346895dd18a65/html5/thumbnails/18.jpg)
DBCC CHECKDB, contd• Did CHECKDB fail?
• If it stops before completing successfully, something bad has happened that is preventing CHECKDB from running
• This means there is no choice but to restore from a backup as CHECKDB cannot be forced to run (and hence repair)
• Examples of fatal (to CHECKDB) errors • 7984 – 7988: corruption in critical system tables • 8967: invalid states within CHECKDB itself • 8930: corrupt metadata in the database such that
CHECKDB could not run
![Page 19: SQL Server Disaster Recovery 101](https://reader035.fdocuments.net/reader035/viewer/2022062323/568161b9550346895dd18a65/html5/thumbnails/19.jpg)
• Are the corruptions only in non-clustered indexes? • If recommended repair level is REPAIR_REBUILD, then YES! • Otherwise, check all the index IDs in the errors – if they’re
all greater than 1, then YES! • If YES, you *could* just rebuild the
corrupt indexes • Depends on the error, and the size of the index • But, what caused the corruption? • If you just rebuild the indexes, the corruption will probably
happen again (especially if caused by the IO subsystem) • Make sure you do root-cause analysis and take
preventative measures
![Page 20: SQL Server Disaster Recovery 101](https://reader035.fdocuments.net/reader035/viewer/2022062323/568161b9550346895dd18a65/html5/thumbnails/20.jpg)
• Was there an un-repairable error found? • 8909, 8938, 8939 (page header corruption) errors where the type is
‘PFS’ • 8970 error: invalid data for the column type • 8992 error: CHECKCATALOG (metadata mismatch) error • Plus a few more obscure ones
• E.g. an 8904 error (extent is allocated to two objects). This is usually repairable except in the case where the extent is marked as mixed and dedicated, and has pages allocated to multiple objects. • The repair is too complicated and/or destructive so is not attempted.
• None of these can be automatically repaired
![Page 21: SQL Server Disaster Recovery 101](https://reader035.fdocuments.net/reader035/viewer/2022062323/568161b9550346895dd18a65/html5/thumbnails/21.jpg)
Restore or repair? • REPAIR_ALLOW_DATA_LOSS is
destructive • Lot of errors cannot be fixed
automatically • Restore is better if you have a good
backup and proper backup strategy