8.4 NTFS Recovery

download 8.4 NTFS Recovery

of 27

Transcript of 8.4 NTFS Recovery

  • 8/14/2019 8.4 NTFS Recovery

    1/27

    Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas PolzeWindows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze

    Unit OS8: File SystemUnit OS8: File System

    8.4. NTFS Recovery Support8.4. NTFS Recovery Support

  • 8/14/2019 8.4 NTFS Recovery

    2/27

    2

    Copyright NoticeCopyright Notice 2000-2005 David A. Solomon and Mark Russinovich 2000-2005 David A. Solomon and Mark Russinovich

    These materials are part of theThese materials are part of the Windows OperatingWindows Operating

    System Internals Curriculum Development Kit,System Internals Curriculum Development Kit,

    developed by David A. Solomon and Mark E.developed by David A. Solomon and Mark E.

    Russinovich with Andreas PolzeRussinovich with Andreas Polze

    Microsoft has licensed these materials from DavidMicrosoft has licensed these materials from David

    Solomon Expert Seminars, Inc. for distribution toSolomon Expert Seminars, Inc. for distribution to

    academic organizations solely for use in academicacademic organizations solely for use in academic

    environments (and not for commercial use)environments (and not for commercial use)

  • 8/14/2019 8.4 NTFS Recovery

    3/27

    3

    Roadmap for Section 8.4Roadmap for Section 8.4

    The Evolution of File SystemsThe Evolution of File Systems

    Recoverable File SystemRecoverable File System

    Log File Service OperationLog File Service Operation

    NTFS Recovery ProceduresNTFS Recovery Procedures

    Fault-Tolerance SupportFault-Tolerance Support

    Volume Management -Volume Management -Striped and Spanned VolumesStriped and Spanned Volumes

  • 8/14/2019 8.4 NTFS Recovery

    4/27

    4

    NTFS Recovery SupportNTFS Recovery Support

    Transaction-based logging schemeTransaction-based logging scheme

    Fast, even for large disksFast, even for large disks

    Recovery is limited to file system dataRecovery is limited to file system data

    Use transaction processing like SQL server for user dataUse transaction processing like SQL server for user data

    Tradeoff: performance versus fully fault-tolerant file systemTradeoff: performance versus fully fault-tolerant file system

    Design options for file I/O & caching:Design options for file I/O & caching:Careful writeCareful write: VAX/VMS fs, other proprietary OS fs: VAX/VMS fs, other proprietary OS fs

    Lazy writeLazy write: most UNIX fs, OS/2 HPFS: most UNIX fs, OS/2 HPFS

  • 8/14/2019 8.4 NTFS Recovery

    5/27

    5

    Careful Write File SystemsCareful Write File Systems

    OS crash/power loss may corrupt file systemOS crash/power loss may corrupt file system

    Careful write file system orders write operations:Careful write file system orders write operations:

    System crash will produce predictable, nonSystem crash will produce predictable, non--critical inconsistenciescritical inconsistencies

    Update to disk is broken in sub operations:Update to disk is broken in sub operations:

    Sub operations are written seriallySub operations are written serially

    Allocating disk space: first write bits in bitmap indicating usage;Allocating disk space: first write bits in bitmap indicating usage;then allocate space on diskthen allocate space on disk

    I/O requests are serialized:I/O requests are serialized:

    Allocation of disk space by one process has to be completed beforeAllocation of disk space by one process has to be completed beforeanother process may create a fileanother process may create a file

    No interleaving sub operations of the two I/O requestsNo interleaving sub operations of the two I/O requests

    Crash: volume stays usable; no need to run repair utilityCrash: volume stays usable; no need to run repair utility

  • 8/14/2019 8.4 NTFS Recovery

    6/27

    6

    Lazy Write File SystemsLazy Write File Systems

    Careful fCareful fileile ssystemystem write sacrifices speed for safetywrite sacrifices speed for safety

    Lazy write improves performance by write back cachingLazy write improves performance by write back caching

    Modifications are written to the cache;Modifications are written to the cache;

    Cache flush is an optimized background activityCache flush is an optimized background activity

    Less disk writes; buffer can be modified multiple timesLess disk writes; buffer can be modified multiple times

    before being written to diskbefore being written to disk

    File system can return to caller before op. is completedFile system can return to caller before op. is completed

    Inconsistent intermediate states on volume are ignoredInconsistent intermediate states on volume are ignored

    Greater risk / user inconvenience if system failsGreater risk / user inconvenience if system fails

  • 8/14/2019 8.4 NTFS Recovery

    7/27

    7

    Recoverable File SystemRecoverable File System(Journaling File System)(Journaling File System)

    Safety of careful write fs / performance of lazy write fsSafety of careful write fs / performance of lazy write fs

    Log file + fast recovery procedureLog file + fast recovery procedure

    Log file imposes some overheadLog file imposes some overhead

    Optimization over lazy write: distance between cache flushesOptimization over lazy write: distance between cache flushesincreasedincreased

    NTFS supportsNTFS supports cache write-throughcache write-through andand cache flushingcache flushingtriggered by applicationstriggered by applications

    No extra disk I/O to update fs data structures necessary:No extra disk I/O to update fs data structures necessary:all changes to fs structure are recorded in log file which can beall changes to fs structure are recorded in log file which can bewritten in a single operationwritten in a single operation

    In the future, NTFS may support logging for user files (hooks inIn the future, NTFS may support logging for user files (hooks inplace)place)

  • 8/14/2019 8.4 NTFS Recovery

    8/27

    8

    Log File Service (LFS)Log File Service (LFS)

    LFS is designed to provide logging to multipleLFS is designed to provide logging to multiplekernel components (clients)kernel components (clients)

    Currently used only by NTFSCurrently used only by NTFS

    Cachemanager

    I/O manager

    NTFS driver

    Access the mapped

    file or flush the cache

    Flush the

    log file

    Write/flush

    the log file

    Log file

    serviceLog the transaction

    Write the

    Volume updates

  • 8/14/2019 8.4 NTFS Recovery

    9/279

    Log File RegionsLog File Regions

    NTFS calls LFS to read/write restart areaNTFS calls LFS to read/write restart area

    Context info: location of logging area to be used for recoveryContext info: location of logging area to be used for recovery

    LFS maintains 2nd copy of restart areaLFS maintains 2nd copy of restart area

    Logging area: circularly reusedLogging area: circularly reused

    LFS uses logical sequence numbers (LSNs) to identify log recordsLFS uses logical sequence numbers (LSNs) to identify log records

    NTFS never reads/writes transactions to log file directlyNTFS never reads/writes transactions to log file directly

    During recovery:During recovery:

    NTFS calls LFS to read forward; recorded transactions are redoneNTFS calls LFS to read forward; recorded transactions are redone

    NTFS calls LFS to read backward; undo all incompletely loggedNTFS calls LFS to read backward; undo all incompletely loggedtransactionstransactions

    LFS restart area infinite logging area

    Copy 1 Copy 2 Log records

  • 8/14/2019 8.4 NTFS Recovery

    10/2710

    Operation of the LFS/NTFSOperation of the LFS/NTFS

    1.1. NTFS calls LFS to record in (cached) log file anyNTFS calls LFS to record in (cached) log file anytransactions that will modify volume structuretransactions that will modify volume structure

    2.2. NTFS modifies the volume (also in the cache)NTFS modifies the volume (also in the cache)

    3.3. Cache manager calls LFS to flush log file to diskCache manager calls LFS to flush log file to disk(LFS implements flushing by calling cache manager back, telling(LFS implements flushing by calling cache manager back, tellingwhich page to flush)which page to flush)

    4.4. After cash manager flushes log file, it flushes volumeAfter cash manager flushes log file, it flushes volumechangeschanges

    ->-> Transactions of unsuccessful modifications can beTransactions of unsuccessful modifications can beretrieved from log file and un-/redoneretrieved from log file and un-/redone

    Recovery begins automatically the first time a volume isRecovery begins automatically the first time a volume isused after system is rebooted.used after system is rebooted.

  • 8/14/2019 8.4 NTFS Recovery

    11/2711

    Log Record TypesLog Record TypesUpdate recordsUpdate records (series of ...)(series of ...)

    Most common; each record contains:Most common; each record contains:

    Redo informationRedo information: how to reapply on subop. of a committed trans.: how to reapply on subop. of a committed trans.

    Undo informationUndo information: how to reverse a partially logged sub operation: how to reverse a partially logged sub operation

    Last record commits the transactionLast record commits the transaction (not shown here)(not shown here)

    Log file records

    ... T1a T1b T1c

    Redo: Allocate/initialize an MFT file record

    Undo: Deallocate the file record

    Redo: Add the filename to the index

    Undo: Remove the filename from the index

    Redo: Set bits 3-9 in the bitmap

    Undo: Clear bits 3-9 in the bitmap

    Recovery: redo committed/undo incompletely logged transact.Recovery: redo committed/undo incompletely logged transact.

  • 8/14/2019 8.4 NTFS Recovery

    12/2712

    Log Records (contd.)Log Records (contd.)

    Physical vs. logical description of redo/undo actions:Physical vs. logical description of redo/undo actions:

    Delete file a.dat vs. Delete byte range on diskDelete file a.dat vs. Delete byte range on disk

    NTFS writes update records with physical descriptionsNTFS writes update records with physical descriptions

    NTFS wNTFS wrrites update records (usually several) for:ites update records (usually several) for:

    Creating a fileCreating a file

    Deleting a fileDeleting a file

    Extending a fileExtending a file

    Truncating a fileTruncating a file

    Setting file informationSetting file information

    Renaming a fileRenaming a file

    Changing security applied to a fileChanging security applied to a file

    Redo/undo ops. must be idempotentRedo/undo ops. must be idempotent(might be applied twice)(might be applied twice)

  • 8/14/2019 8.4 NTFS Recovery

    13/2713

    Checkpoint RecordsCheckpoint Records

    NTFS periodically writes a checkpoint recordNTFS periodically writes a checkpoint record

    Describes, what processing would be necessary to recover aDescribes, what processing would be necessary to recover a

    volume if a crash would occur immediatelyvolume if a crash would occur immediately

    How far back in the log file must NTFS go to begin recoveryHow far back in the log file must NTFS go to begin recovery

    LSN of checkpoint record is stored in restart areaLSN of checkpoint record is stored in restart area

    Log file records

    ... LSN 2058 LSN 2059 LSN 2060

    Checkpoint record

    NTFS restart

  • 8/14/2019 8.4 NTFS Recovery

    14/2714

    Log File FullLog File Full

    LFS presents log file to NTFS as is it were infinitely largeLFS presents log file to NTFS as is it were infinitely largeWriting checkpoint records usually frees up spaceWriting checkpoint records usually frees up space

    LFS tracks several numbers:LFS tracks several numbers:

    Available log spaceAvailable log space

    Amount of space needed to write an incoming log record and to undo theAmount of space needed to write an incoming log record and to undo the

    writewriteAmount of space needed to roll back all active (no committed) transactions,Amount of space needed to roll back all active (no committed) transactions,should that be necessaryshould that be necessary

    Insufficient space: Log file full error & NTFS exceptionInsufficient space: Log file full error & NTFS exception

    NTFS prevents further transactions on files (block creation/deletion)NTFS prevents further transactions on files (block creation/deletion)

    Active transactions are completed or receive log file full exceptionActive transactions are completed or receive log file full exception

    NTFS calls cache manager to flush unwritten dataNTFS calls cache manager to flush unwritten data

    If data is written, NTFS marks log file empty; resets beginning of log fileIf data is written, NTFS marks log file empty; resets beginning of log file

    No effect on executing programs (short I/O pause)No effect on executing programs (short I/O pause)

  • 8/14/2019 8.4 NTFS Recovery

    15/2715

    Recovery - PrinciplesRecovery - Principles

    NTFS performs automatic recoveryNTFS performs automatic recoveryRecovery depends on two NTFS in-memory tables:Recovery depends on two NTFS in-memory tables:

    Transaction tableTransaction table: keeps track of active transactions (not completed): keeps track of active transactions (not completed)(sub operations of these transactions must be removed from disk)(sub operations of these transactions must be removed from disk)

    Dirty page tableDirty page table: records which pages in cache contain: records which pages in cache contain

    modifications to file system structure that have not yet been writtenmodifications to file system structure that have not yet been writtento diskto disk

    NTFS writes checkpoint every 5 sec.NTFS writes checkpoint every 5 sec.

    Includes copy of transaction table and dirty page tableIncludes copy of transaction table and dirty page table

    Checkpoint includes LSNs of the log records containing the tablesCheckpoint includes LSNs of the log records containing the tables

    Dirty page

    table

    Update

    record

    Transaction

    table

    Checkpoint

    record

    Update

    record

    Update

    record

    Begin of checkpoint operation End of checkpoint operation

    Analysis pass

  • 8/14/2019 8.4 NTFS Recovery

    16/2716

    Recovery - PassesRecovery - Passes

    1.1. Analysis passAnalysis pass NTFS scans forward in log file from beginning of last checkpointNTFS scans forward in log file from beginning of last checkpoint

    Updates transaction/dirty page tables it copied in memoryUpdates transaction/dirty page tables it copied in memory

    NTFS scans tables for oldest update record of a nonNTFS scans tables for oldest update record of a non--commitcommittted trans.ed trans.

    1.1. Redo passRedo pass

    NTFS looks for page update records which contain volumeNTFS looks for page update records which contain volumemodification that might not have been flushed to diskmodification that might not have been flushed to disk

    NTFS redoes these updates in the cache until it reaches end of log fileNTFS redoes these updates in the cache until it reaches end of log file

    Cache manager lazy writer thread begins to flush cache to diskCache manager lazy writer thread begins to flush cache to disk

    1.1. Undo passUndo pass

    Roll back any transactions that weren'tt committed when system failedRoll back any transactions that weren'tt committed when system failed

    After undo pass volume is at consistent stateAfter undo pass volume is at consistent state

    Write empty LFS restart area; no recovery is needed if system fails nowWrite empty LFS restart area; no recovery is needed if system fails now

  • 8/14/2019 8.4 NTFS Recovery

    17/2717

    Undo Pass - ExampleUndo Pass - Example

    Transaction 1 was commited before power failureTransaction 1 was commited before power failure

    Transaction 2 was still activeTransaction 2 was still active

    NTFS must log undo operations in log file!NTFS must log undo operations in log file!

    Power might fail again during recovery;Power might fail again during recovery;

    NTFS would have to redo its undo operationsNTFS would have to redo its undo operations

    LSN

    4044

    LSN

    4045

    LSN

    4046

    LSN

    4047

    LSN

    4048

    LSN

    4049

    Transaction committed record

    Redo: Allocate/Initialize an MFT file record

    Undo: Deallocate the file record

    Redo: Add the filename to the index

    Undo: Remove the filename from the index

    Redo: Set bits 3-9 in the bitmap

    Undo: Clear bits 3-9 in the bitmap

    Powerfailure

  • 8/14/2019 8.4 NTFS Recovery

    18/2718

    NTFS Recovery - ConclusionsNTFS Recovery - Conclusions

    Recovery will return volume to some preexisting consistent state (notRecovery will return volume to some preexisting consistent state (not

    necessarily state before crash)necessarily state before crash)

    Lazy commit algorithm: log file is not immediately flushed when a transactionLazy commit algorithm: log file is not immediately flushed when a transactioncommitted record is writtencommitted record is written

    LFS batches records;LFS batches records;

    Flush when cache manager calls or check pointing record is writtenFlush when cache manager calls or check pointing record is written

    (once every 5 sec)(once every 5 sec)Several parallel transactions might have been active before crashSeveral parallel transactions might have been active before crash

    NTFS uses log file mechanisms for error handlingNTFS uses log file mechanisms for error handling

    Most I/O errors are not file system errorsMost I/O errors are not file system errors

    NTFS might create MFT record and detect that disk is full when allocatingNTFS might create MFT record and detect that disk is full when allocating

    space for a file in the bitmapspace for a file in the bitmap

    NTFS uses log info to undo changes and returns disk full error to callerNTFS uses log info to undo changes and returns disk full error to caller

  • 8/14/2019 8.4 NTFS Recovery

    19/2719

    Fault Tolerance SupportFault Tolerance Support

    NTFS capabilities are enhanced by the fault-tolerantNTFS capabilities are enhanced by the fault-tolerantvolume managersvolume managers FtDiskFtDisk/DMIO/DMIO

    Lies above hard disk drivers in the I/O systems layered driverLies above hard disk drivers in the I/O systems layered driverschemescheme

    FtDisk for basic disksFtDisk for basic disks

    DMIO for dynamic disksDMIO for dynamic disks

    Volume management capabilitiesVolume management capabilities::

    Redundant data storageRedundant data storage

    Dynamic data recovery from bad sectors on SCSI disksDynamic data recovery from bad sectors on SCSI disks

    NTFS itself implements bad-sector recovery forNTFS itself implements bad-sector recovery fornon-SCSI disksnon-SCSI disks

  • 8/14/2019 8.4 NTFS Recovery

    20/2720

    Volume Management Features Volume Management Features

    Spanned VolumesSpanned Volumes

    SpannedSpanned VolumeVolumess::

    single logical volume composed of a maximum of 32 areas of free spacesingle logical volume composed of a maximum of 32 areas of free spaceon one or more diskson one or more disks

    NTFS volume sets can be dynamically increased in sizeNTFS volume sets can be dynamically increased in size(only bitmap file which stores allocation status needs to be extended)(only bitmap file which stores allocation status needs to be extended)

    FtDiskFtDisk/DMIO/DMIO hide physical configuration of disks from file systemhide physical configuration of disks from file system

    Tool: WindowsTool: Windows Disk Management MMC snap-inDisk Management MMC snap-in

    Spanned volumes were called volume sets in Windows NT 4.0Spanned volumes were called volume sets in Windows NT 4.0

    C:

    (100 MB)

    E:

    (100 MB)

    D:

    (100 MB)

    D:

    (100 MB)

    Volume set D:

    occupies half of

    two disks

  • 8/14/2019 8.4 NTFS Recovery

    21/2721

    StripeStriped Volumesd Volumes

    Series of partitions, one partition per disk (of same size)Series of partitions, one partition per disk (of same size)

    Combined into a single logical volumeCombined into a single logical volume

    FtDiskFtDisk/DMIO/DMIO optimize data storage and retrieval timesoptimize data storage and retrieval times

    Stripes are narrow: 64KBStripes are narrow: 64KB

    Data tends to be distributed evenly among disksData tends to be distributed evenly among disks

    Multiple pending read/write ops. will operate on different disksMultiple pending read/write ops. will operate on different disks

    Latency for disk I/O is often reduced (parallel seek operations)Latency for disk I/O is often reduced (parallel seek operations)

    (150 MB) (150 MB) (150 MB)

    1 2

    4

    3

  • 8/14/2019 8.4 NTFS Recovery

    22/2722

    Fault Tolerant VolumesFault Tolerant Volumes

    FtDiskFtDisk/DMIO/DMIO implement redundant storage schemesimplement redundant storage schemesMirror setsMirror sets

    Stripe sets with parityStripe sets with parity

    Sector sparingSector sparing

    ToolToolss:: Windows Disk Management MMC snap-inWindows Disk Management MMC snap-in

    Mirrored VolumesMirrored Volumes:: Contents of a partition on one disk are duplicated on another diskContents of a partition on one disk are duplicated on another disk

    FtDiskFtDisk/DMIO/DMIO write same data to both locationswrite same data to both locations

    Read operations are done simultaneously on both disksRead operations are done simultaneously on both disks

    (load balancing)(load balancing)

    C:

    C:

    (mirror)

  • 8/14/2019 8.4 NTFS Recovery

    23/2723

    RAID-5 VolumesRAID-5 Volumes

    Fault tolerant version of a regular stripe setFault tolerant version of a regular stripe set

    Parity: logical sum (XOR)Parity: logical sum (XOR)

    Parity info is distributed evenly over available disksParity info is distributed evenly over available disks

    FtDiskFtDisk/DMIO/DMIO reconstruct missing data by using XOR op.reconstruct missing data by using XOR op.

    parity

  • 8/14/2019 8.4 NTFS Recovery

    24/2724

    Bad Cluster RecoveryBad Cluster Recovery

    Sector sparing is supported by FtDiskSector sparing is supported by FtDisk/DMIO/DMIO

    Dynamic copying of recovered data to spare sectorsDynamic copying of recovered data to spare sectors

    Without intervention from file system / userWithout intervention from file system / user

    Works for certain SCSI disksWorks for certain SCSI disks

    FtDiskFtDisk/DMIO/DMIO return bad sector warning to NTFSreturn bad sector warning to NTFS

    Sector reSector re--mapping is supported by NTFSmapping is supported by NTFS

    NTFS will not reuse bad clustersNTFS will not reuse bad clusters

    NTFS copies data recovered by FtDiskNTFS copies data recovered by FtDisk/DMIO/DMIO into a new clusterinto a new cluster

    NTFS cannot recover data from bad sector without helpNTFS cannot recover data from bad sector without helpfrom FtDiskfrom FtDisk/DMIO/DMIO

    NTFS will never write to bad sector (reNTFS will never write to bad sector (re--map before write)map before write)

  • 8/14/2019 8.4 NTFS Recovery

    25/2725

    Bad-cluster re-mappingBad-cluster re-mapping

    NTFS filenameStandard info Security desc. Data

    DataData

    User file

    StartingVCN

    StartingLCN

    Number ofclusters

    0 1355 2

    2 1049 1

    3 1588 4

    VCN 0 1

    LCN 1355 1356 DataData

    VCN 3 4 5 6

    LCN 1588 1589 1590 1591

    DataData

    VCN 2

    LCN 1049

    NTFS filenameStandard info Security desc. DataStarting

    VCNStarting

    LCNNumber of

    clusters

    0 1357 1

    BadBad

    VCN 0

    LCN 1357

  • 8/14/2019 8.4 NTFS Recovery

    26/2726

    Further ReadingFurther Reading

    Mark E. Russinovich and David A. Solomon,Mark E. Russinovich and David A. Solomon,Microsoft Windows Internals, 4th Edition, MicrosoftMicrosoft Windows Internals, 4th Edition, MicrosoftPress, 2004.Press, 2004.

    Chapter 12 - File SystemsChapter 12 - File SystemsNTFS Recovery Support (from pp. 775)NTFS Recovery Support (from pp. 775)

    Chapter 10 - Storage ManagementChapter 10 - Storage ManagementVolume Management (from pp. 622)Volume Management (from pp. 622)

  • 8/14/2019 8.4 NTFS Recovery

    27/27

    Source Code ReferencesSource Code References

    Windows Research Kernel sources do notWindows Research Kernel sources do not

    include NTFSinclude NTFS

    A raw file system driver is included inA raw file system driver is included in

    \base\ntos\raw\base\ntos\raw

    Also see \base\ntos\fstrl (File System Run-TimeAlso see \base\ntos\fstrl (File System Run-Time

    Library)Library)