15: Filesystem Examples: Ext3, NTFS -...

13
1 15: Filesystem Examples: Ext3, NTFS Mark Handley Linux Ext3 Filesystem

Transcript of 15: Filesystem Examples: Ext3, NTFS -...

Page 1: 15: Filesystem Examples: Ext3, NTFS - UCLnrg.cs.ucl.ac.uk/mjh/3005/2008/15-filesystem-examples2.pdf · NTFS NTFS replaces FAT file system in recent Windows releases. Design from scratch:

1

15: Filesystem Examples:Ext3, NTFS

Mark Handley

Linux Ext3 Filesystem

Page 2: 15: Filesystem Examples: Ext3, NTFS - UCLnrg.cs.ucl.ac.uk/mjh/3005/2008/15-filesystem-examples2.pdf · NTFS NTFS replaces FAT file system in recent Windows releases. Design from scratch:

2

Problem: Recovery after a crash

fsck on a large disk can be extremely slow.An issue for laptops. Power failure is common.An issue for highly available servers. Failure is

rare but recovery must be reliable and fast.

With a Journaling File System (JFS), don’t need tocheck the whole disk.Re-read the journal from the last checkpoint after a

crash.

Journaling Filesystem

Atomically updated Old and new versions of data held on disk until the update is

committed.

Undo logging: Copy old data to the log. Write new data to the disk. If you crash during update, copy old data from the log.

Redo logging: Write new data to the log. Old data remains on disk until commit. If you crash during update, copy new data from the log

Page 3: 15: Filesystem Examples: Ext3, NTFS - UCLnrg.cs.ucl.ac.uk/mjh/3005/2008/15-filesystem-examples2.pdf · NTFS NTFS replaces FAT file system in recent Windows releases. Design from scratch:

3

Journal Data and Transactions

Fixed size, stored on disk, used as a circular buffer Contains:

Metadata: entire contents of a single block of filesystemmetadata, as updated by the transaction.

Descriptor: where metadata really lives on disk Header: head and tail of journal (in circular buffer)

Each disk update is an atomic transaction. Write new data to the journal. Not complete until a commit.

Only after commit is the update final. Will be flushed to disk in due course.

Commit

Transaction is committed. Subsequent file system operations will go in a new

transaction. Flush transaction to journal on disk, pin the memory buffers

because the data is not yet in the right place on disk. After flushed, update the journal header blocks. Sync the journal transaction to disk. Unpin the memory buffers. Release space in the journal.

Page 4: 15: Filesystem Examples: Ext3, NTFS - UCLnrg.cs.ucl.ac.uk/mjh/3005/2008/15-filesystem-examples2.pdf · NTFS NTFS replaces FAT file system in recent Windows releases. Design from scratch:

4

Crash Recovery

Only completed updates have been committed.During reboot, committed transactions in the

journal are re-applied. Old and updated data are each stored separately until

the commit block is written to the journal on disk.

Ext3 vs Ext2 vs LSFS

Ext3 is completely backwards compatible with Ext2.Just adds a journal in a special file.Does not change the basic filesystem structure,

inodes, directories, etc. A Log-structured filesystem ONLY contains a log.

Everything is written to the end of the log.

Page 5: 15: Filesystem Examples: Ext3, NTFS - UCLnrg.cs.ucl.ac.uk/mjh/3005/2008/15-filesystem-examples2.pdf · NTFS NTFS replaces FAT file system in recent Windows releases. Design from scratch:

5

NTFS Filesystem

File System API Calls in Windows 2000,XP…

Principle Win32 API functions for file I/O Second column gives nearest UNIX equivalent

Page 6: 15: Filesystem Examples: Ext3, NTFS - UCLnrg.cs.ucl.ac.uk/mjh/3005/2008/15-filesystem-examples2.pdf · NTFS NTFS replaces FAT file system in recent Windows releases. Design from scratch:

6

Windows 2000:

File System API Windows API has very many parameters. Eg CreateFile() has 7 parameters:

Pointer to filename to open/create. Flags for read/write/both. Flags for whether multiple processes can simultaneously

open file. Pointer to security descriptor telling who can access the

file. Flags telling what to do if the file exists/doesn’t exist. Flags dealing with attributes such as archiving and

compression. The handle of a file whose attributes should be cloned for

the new file.

Windows 2000 File System API:

Copying a File/* Open Files for Input and Output */inhandle = CreateFile(“data”, GENERIC_READ, 0, NULL, OPEN_EXISTING, 0, NULL);outhandle = CreateFile(“new”, GENERIC_WRITE, 0, NULL, CREATE_ALWAYS,

FILE_ATTRIBUTE_NORMAL, NULL);

/* Copy the File */do {

s = ReadFile(inhandle, buffer, BUF_SIZE, &count, NULL);if (s && count > 0)

WriteFile(outhandle, buffer, count, &ocnt, NULL);} while (s > 0 && count > 0)

/* Close the Files */CloseHandle(inhandle);CloseHandle(outhandle);

Page 7: 15: Filesystem Examples: Ext3, NTFS - UCLnrg.cs.ucl.ac.uk/mjh/3005/2008/15-filesystem-examples2.pdf · NTFS NTFS replaces FAT file system in recent Windows releases. Design from scratch:

7

Windows 2000 File System API:System Calls for Directory Management

Second column gives nearest UNIX equivalent, when one exists

NTFS

NTFS replaces FAT file system in recent Windows releases. Design from scratch: complex and fully featured. Each volume (partition) is a linear sequence of blocks

4KB blocksize is typical 64bit block IDs.

Each volume has a Master File Table (MFT) Sequence of 1KB records. One (or more) record per file or directory.

Somewhat like i-nodes, but more flexible. Each MFT record is a sequence of variable length

(attribute, value) pairs. Long attributes can be stored externally, and a pointer kept

in the MFT record.

Page 8: 15: Filesystem Examples: Ext3, NTFS - UCLnrg.cs.ucl.ac.uk/mjh/3005/2008/15-filesystem-examples2.pdf · NTFS NTFS replaces FAT file system in recent Windows releases. Design from scratch:

8

NTFS MasterFile Table

First 16 entries arereserved for NTFSmetadata files.

MFT is itself a file. 1st record

describes the MFTfile itself (whenthe blocks are ondisk).

MFT MetaData Attributes

$LogFile: when many changes to filesystem are made,they’re logged here first. If system goes down,consistency can be recovered by reading the log.

$AttrDef: MFT attributes are defined here, allowingextensibility.

$Bitmap: keeps track of free blocks.$Boot: points to bootstrap loader for OS booting.$Upcase: Defines filename case mapping (for non-

roman alphabets).

Page 9: 15: Filesystem Examples: Ext3, NTFS - UCLnrg.cs.ucl.ac.uk/mjh/3005/2008/15-filesystem-examples2.pdf · NTFS NTFS replaces FAT file system in recent Windows releases. Design from scratch:

9

File System Structure (2)

The attributes used in MFT records

NTFS File Block Management

NTFS tries to allocate files in runs of consecutive blocks. Unlike with FAT, files can contain holes. In an MFT record, blocks are described by a sequence of

DATA attributes - one for each section between holes. Within each DATA attribute, there are multiple fields each

indicating a run of consecutive disk blocks. If all the attributes don’t fit into one MFT record, extension

records can be use to hold more.

Page 10: 15: Filesystem Examples: Ext3, NTFS - UCLnrg.cs.ucl.ac.uk/mjh/3005/2008/15-filesystem-examples2.pdf · NTFS NTFS replaces FAT file system in recent Windows releases. Design from scratch:

10

MFT Record for Normal File

An MFT record for a three-run, nine-block file

Extension MFT Records

A file that requires three MFT records to store its runs. Typically because file is very fragmented or very large.

Page 11: 15: Filesystem Examples: Ext3, NTFS - UCLnrg.cs.ucl.ac.uk/mjh/3005/2008/15-filesystem-examples2.pdf · NTFS NTFS replaces FAT file system in recent Windows releases. Design from scratch:

11

MFT Record for a Small Directory

The MFT record for a small directory. Directory Entries stored as a simple list.

Large directories use B+ trees instead.

NTFS File Name Lookup

Steps in looking up the file C:\maria\web.htm First prepend \?? to filename, and lookup in \?? directory

Page 12: 15: Filesystem Examples: Ext3, NTFS - UCLnrg.cs.ucl.ac.uk/mjh/3005/2008/15-filesystem-examples2.pdf · NTFS NTFS replaces FAT file system in recent Windows releases. Design from scratch:

12

NTFS File Compression

API can specify that a file should be compressed by thefilesystem.

OS attempts to compress 16 blocks at a time. If compression reduces to 15 blocks or less, compressed

blocks are written to disk. Otherwise uncompressed blocks are written. Runs of compressed blocks use two DATA runs in MFT,

one for the compressed data blocks, and one for how muchcompression was achieved.

Seeking is not terribly efficient: Must decompress 16 blocks at a time to find the correct

uncompressed block.

NTFS File Compression

(a) An example of a 48-block file being compressed to 32 blocks(b) The MTF record for the file after compression

Page 13: 15: Filesystem Examples: Ext3, NTFS - UCLnrg.cs.ucl.ac.uk/mjh/3005/2008/15-filesystem-examples2.pdf · NTFS NTFS replaces FAT file system in recent Windows releases. Design from scratch:

13

Encrypting File System (EFS) sits above NTFS, below theWin32 API.

K retrieved

user's public key

NTFS File Encryption