CISC 3320 File System Implementation - GitHub Pages
Transcript of CISC 3320 File System Implementation - GitHub Pages
CISC 3320
File System
ImplementationHui Chen
Department of Computer & Information Science
CUNY Brooklyn College
12/2/2019 1CUNY | Brooklyn College
Acknowledgement
• These slides are a revision of the slides
provided by the authors of the textbook
via the publisher of the textbook
12/2/2019 CUNY | Brooklyn College 2
Outline
• File-System Structure
• File-System Operations
• Directory Implementation
• Allocation Methods
• Free-Space Management
• Efficiency and Performance
• Recovery
• Example: WAFL File System
12/2/2019 CUNY | Brooklyn College 3
File-System Structure
• File structure
• Logical storage unit
• Collection of related information
• File system resides on secondary storage (disks)
• Provided user interface to storage, mapping logical to physical
• Provides efficient and convenient access to disk by allowing data to be stored, located retrieved easily
• Disk provides in-place rewrite and random access
• I/O transfers performed in blocks of sectors (usually 512 bytes)
• File control block (FCB) – storage structure consisting of information about a file
• Device driver controls the physical device
• File system organized into layers
12/2/2019 CUNY | Brooklyn College 4
File-System
Layers
12/2/2019 CUNY | Brooklyn College 5
File System Layers: Device
Drivers• Device drivers manage I/O devices at
the I/O control layer
• Outputs low-level hardware specific
commands to hardware controller
• Example:
• Given commands like “read drive1, cylinder 72,
track 2, sector 10, into memory location 1060”
(CHS addressing) or “read drive1, block 1442, into
memory location 1060” (LBA)
12/2/2019 CUNY | Brooklyn College 6
File System Layers: Basic File
System• Basic file system
• Given command like “retrieve block 123”
translates to device driver
• Also manages memory buffers and
caches (allocation, freeing, replacement)
• Buffers hold data in transit
• Caches hold frequently used data
12/2/2019 CUNY | Brooklyn College 7
File System Layers: File
Organization Module• File organization module understands
files, logical address, and physical blocks
• Translates logical block # to physical block #
• Manages free space, disk allocation
12/2/2019 CUNY | Brooklyn College 8
File System Layers: Logical
File System• Logical file system manages metadata
information
• Translates file name into file number, file
handle, location by maintaining file control
blocks (inodes in UNIX)
• Directory management
• Protection
12/2/2019 CUNY | Brooklyn College 9
File System Layers: Benefits
and Overhead• Layering useful for reducing complexity
and redundancy
• Duplication of code is minimized.
• The I/O control and sometimes the basic file-
system code can be used by multiple file systems.
• Each file system can then have its own logical file-
system and file-organization modules.
• But layering adds overhead and can
decrease performance
12/2/2019 CUNY | Brooklyn College 10
File Systems: Examples
• Many file systems, sometimes many within an operating system
• Each with its own format, e.g.,
• CD-ROM is ISO 9660;
• Unix has UFS, FFS;
• Windows has FAT, FAT32, NTFS as well as floppy, CD, DVD Blu-ray;Linux has more than 130 types, with extended file system ext3 and ext4 leading; plus distributed file systems, etc.)
• New ones still arriving – ZFS, GoogleFS, Oracle ASM, FUSE
12/2/2019 CUNY | Brooklyn College 11
Questions?
• File system structure and layering
12/2/2019 CUNY | Brooklyn College 12
File-System Operations
• On-disk and in-memory structures
• We have system calls at the API level, but how do we implement their functions?
• Boot control block contains info needed by system to boot OS from that volume
• Needed if volume contains OS, usually first block of volume
• Volume control block (superblock, master file table) contains volume details
• Total # of blocks, # of free blocks, block size, free block pointers or array
• Directory structure organizes the files
• Names and inode numbers, master file table12/2/2019 CUNY | Brooklyn College 13
File-System Operations:
Control Blocks• Boot control block contains info needed by system to boot OS from that volume
• Needed if volume contains OS, usually first block of volume
• Volume control block (superblock, master file table) contains volume details
• Total # of blocks, # of free blocks, block size, free block pointers or array
12/2/2019 CUNY | Brooklyn College 14
File-System Operations:
Directory Structures• Directory structure organizes the files
• Names and File Control Block (e.g., inode)
numbers, master file table
12/2/2019 CUNY | Brooklyn College 15
File-System Operations: Per-
File File Control Block• Per-file File Control Block (FCB)
contains many details about the file
• typically inode number, permissions, size,
dates
• NFTS stores into in master file table using
relational DB structures
12/2/2019 CUNY | Brooklyn College 16
In-Memory File System
Structures• Mount table storing file system mounts, mount
points, file system types
• system-wide open-file table contains a copy of the FCB of each file and other info
• per-process open-file table contains pointers to appropriate entries in system-wide open-file table as well as other info
• Examples: (a) opening file;(b) reading a file
• buffers hold data blocks from secondary storage
• Open returns a file handle for subsequent use
• Data from read eventually copied to specified user process memory address
12/2/2019 CUNY | Brooklyn College 17
12/2/2019 CUNY | Brooklyn College 18
Directory Implementation
• Linear list of file names with pointer to the data blocks
• Simple to program
• Time-consuming to execute
• Linear search time
• Could keep ordered alphabetically via linked list or use B+ tree
• Hash Table – linear list with hash data structure
• Decreases directory search time
• Collisions – situations where two file names hash to the same location
• Only good if entries are fixed size, or use chained-overflow method
12/2/2019 CUNY | Brooklyn College 19
Questions?
• File system operations and supporting
data structures
• On-disk structures
• In-memory structures
• Directory implementation
12/2/2019 CUNY | Brooklyn College 20
Allocation Methods
• Continuous allocation
• Linked allocation
• Indexed allocation
• Combined scheme
12/2/2019 CUNY | Brooklyn College 21
Allocation Methods:
Contiguous• An allocation method refers to how disk blocks
are allocated for files:
• Contiguous allocation – each file occupies set of contiguous blocks
• Best performance in most cases
• Simple
• only starting location (block #) and length (number of blocks) are required
• Problems include
• finding space for file, knowing file size, external fragmentation, need for compaction off-line (downtime) or on-line
12/2/2019 CUNY | Brooklyn College 22
12/2/2019 CUNY | Brooklyn College 23
• Mapping from logical to physical
LA/512
Q
R
Block to be accessed = Q +
starting address
Displacement into block = R
Extent-Based Systems
• A modified contiguous allocation scheme
used in many newer file systems (e.g.,
Veritas File System)
• Extent-based file systems allocate disk
blocks in extents
• An extent is a contiguous block of disks
• Extents are allocated for file allocation
• A file consists of one or more extents
12/2/2019 CUNY | Brooklyn College 24
Allocation Methods - Linked
• Linked allocation – each file a linked list of blocks
• File ends at nil pointer
• No external fragmentation
• Each block contains pointer to next block
• No compaction, external fragmentation
• Free space management system called when new block needed
• Improve efficiency by clustering blocks into groups but increases internal fragmentation
• Reliability can be a problem
• Locating a block can take many I/Os and disk seeks
12/2/2019 CUNY | Brooklyn College 25
Linked Allocation
• Each file is a linked list of disk blocks: blocks
may be scattered anywhere on the disk
• Mapping
• Block to be accessed is the Qth block in the
linked chain of blocks representing the file.
• Displacement into block = R + 1
12/2/2019 CUNY | Brooklyn College 26
pointerblock =
LA/511
Q
R
12/2/2019 CUNY | Brooklyn College 27
FAT Variation
• FAT (File Allocation Table) variation
• Much like a linked list, but faster on disk and
cacheable
• Beginning of volume has table, indexed by
block number
• New block allocation simple
12/2/2019 CUNY | Brooklyn College 28
File-Allocation Table
12/2/2019 CUNY | Brooklyn College 29
Allocation Methods - Indexed
• Indexed allocation
• Each file has its own index block(s) of
pointers to its data blocks
• Logical view
12/2/2019 CUNY | Brooklyn College 30
index table
12/2/2019 CUNY | Brooklyn College 31
Indexed Allocation
• Need index table
• Random access
• Dynamic access without external fragmentation, but have overhead of index block
• Mapping from logical to physical in a file of maximum size of 256K bytes and block size of 512 bytes. We need only 1 block for index table
12/2/2019 CUNY | Brooklyn College 32
LA/512
Q
R
Q = displacement into index tableR = displacement into block
Mapping in Indexed Allocation
• Mapping from logical to physical in a file of unbounded length (block size of 512 words)
• Linked scheme – Link blocks of index table (no limit on size)
12/2/2019 CUNY | Brooklyn College 33
LA / (512 x 511)
Q1
R1
Q1 = block of index table
R1 is used as follows:
R1 / 512
Q2
R2
Q2 = displacement into block of index table
R2 displacement into block of file:
• Two-level index (4K blocks could store
1,024 four-byte pointers in outer index
→ 1,048,567 data blocks and file size of
up to 4GB)
12/2/2019 CUNY | Brooklyn College 34
12/2/2019 CUNY | Brooklyn College 35
LA / (512 x 512)
Q1
R1
Q1 = displacement into outer-index
R1 is used as follows:
R1 / 512
Q2
R2
Q2 = displacement into block of index table
R2 displacement into block of file:
12/2/2019 CUNY | Brooklyn College 36
Combined Scheme: UNIX
UFS
12/2/2019 CUNY | Brooklyn College 37
More index blocks than can be addressed with 32-bit file pointer
4K bytes per block, 32-bit addresses
Performance• Best method depends on file access type
• Contiguous great for sequential and random
• Linked good for sequential, not random
• Declare access type at creation -> select either contiguous or linked
• Indexed more complex
• Single block access could require 2 index block reads then data block read
• Clustering can help improve throughput, reduce CPU overhead
• For NVM, no disk head so different algorithms and optimizations needed
• Using old algorithm uses many CPU cycles trying to avoid non-existent head movement
• With NVM goal is to reduce CPU cycles and overall path needed for I/O
12/2/2019 CUNY | Brooklyn College 38
Performance
• Adding instructions to the execution path to save one disk I/O is reasonable
• Intel Core i7 Extreme Edition 990x (2011) at 3.46Ghz = 159,000 MIPS
• http://en.wikipedia.org/wiki/Instructions_per_second
• Typical disk drive at 250 I/Os per second
• 159,000 MIPS / 250 = 630 million instructions during one disk I/O
• Fast SSD drives provide 60,000 IOPS
• 159,000 MIPS / 60,000 = 2.65 millions instructions during one disk I/O
12/2/2019 CUNY | Brooklyn College 39
Questions?
• Continuous allocation
• Linked allocation
• Indexed allocation
• Combined scheme
12/2/2019 CUNY | Brooklyn College 40
Free-Space Management
• File system maintains free-space list to
track available blocks/clusters
• (Using term “block” for simplicity)
12/2/2019 CUNY | Brooklyn College 41
Bit Vector/Map
• (n blocks)
12/2/2019 CUNY | Brooklyn College 42
…
0 1 2 n-1
bit[i] = 1 block[i] free
0 block[i] occupied
Block number calculation
(number of bits per word) *
(number of 0-value words) +
offset of first 1 bit
CPUs have instructions to return offset within word of first “1” bit
• Bit map requires extra space
• Example:
block size = 4KB = 212 bytes
disk size = 240 bytes (1 terabyte)
n = 240/212 = 228 bits (or 32MB)
if clusters of 4 blocks -> 8MB of memory
• Easy to get contiguous files
12/2/2019 CUNY | Brooklyn College 43
Linked Free Space List on
Disk
12/2/2019 CUNY | Brooklyn College 44
• Linked list (free list)
• Cannot get contiguous
space easily
• No waste of space
• No need to traverse the
entire list (if # free blocks
recorded)
Grouping and Counting
• Grouping • Modify linked list to store address of next n-1
free blocks in first free block, plus a pointer to next block that contains free-block-pointers (like this one)
• Counting• Because space is frequently contiguously used
and freed, with contiguous-allocation allocation, extents, or clustering
• Keep address of first free block and count of following free blocks
• Free space list then has entries containing addresses and counts
12/2/2019 CUNY | Brooklyn College 45
Space Maps• Used in ZFS
• Consider meta-data I/O on very large file systems
• Full data structures like bit maps couldn’t fit in memory -> thousands of I/Os
• Divides device space into metaslab units and manages metaslabs
• Given volume can contain hundreds of metaslabs
• Each metaslab has associated space map
• Uses counting algorithm
• But records to log file rather than file system
• Log of all block activity, in time order, in counting format
• Metaslab activity -> load space map into memory in balanced-tree structure, indexed by offset
• Replay log into that structure
• Combine contiguous free blocks into single entry12/2/2019 CUNY | Brooklyn College 46
TRIMing Unused Blocks
• HDDS overwrite in place so need only free list
• Blocks not treated specially when freed
• Keeps its data but without any file pointers to it, until overwritten
• Storage devices not allowing overwrite (like NVM) suffer badly with same algorithm
• Must be erased before written, erases made in large chunks (blocks, composed of pages) and are slow
• TRIM is a newer mechanism for the file system to inform the NVM storage device that a page is free
• Can be garbage collected or if block is free, now block can be erased
12/2/2019 CUNY | Brooklyn College 47
Efficiency
• Efficiency dependent on:
• Disk allocation and directory algorithms
• Types of data kept in file’s directory entry
• Pre-allocation or as-needed allocation of
metadata structures
• Fixed-size or varying-size data structures
12/2/2019 CUNY | Brooklyn College 48
Performance
• Performance
• Keeping data and metadata close together
• Buffer cache – separate section of main memory for frequently used blocks
• Synchronous writes sometimes requested by apps or needed by OS
• No buffering / caching – writes must hit disk before acknowledgement
• Asynchronous writes more common, buffer-able, faster
• Free-behind and read-ahead – techniques to optimize sequential access
• Reads frequently slower than writes
12/2/2019 CUNY | Brooklyn College 49
Page Cache
• A page cache caches pages rather than
disk blocks using virtual memory
techniques and addresses
• Memory-mapped I/O uses a page cache
• Routine I/O through the file system uses
the buffer (disk) cache
12/2/2019 CUNY | Brooklyn College 50
I/O Without a Unified Buffer
Cache
12/2/2019 CUNY | Brooklyn College 51
Unified Buffer Cache
• A unified buffer cache uses the same
page cache to cache both memory-
mapped pages and ordinary file system
I/O to avoid double caching
• But which caches get priority, and what
replacement algorithms to use?
12/2/2019 CUNY | Brooklyn College 52
I/O Using a Unified Buffer
Cache
12/2/2019 CUNY | Brooklyn College 53
Questions?
• Efficiency
• Performance
• Cache
12/2/2019 CUNY | Brooklyn College 54
Recovery
• Consistency checking – compares data in
directory structure with data blocks on disk,
and tries to fix inconsistencies
• Can be slow and sometimes fails
• Use system programs to back up data from
disk to another storage device (magnetic
tape, other magnetic disk, optical)
• Recover lost file or disk by restoring data
from backup
12/2/2019 CUNY | Brooklyn College 55
Log Structured File Systems
• Log structured (or journaling) file systems record each metadata update to the file system as a transaction
• All transactions are written to a log
• A transaction is considered committed once it is written to the log (sequentially)
• Sometimes to a separate device or section of disk
• However, the file system may not yet be updated
• The transactions in the log are asynchronously written to the file system structures
• When the file system structures are modified, the transaction is removed from the log
• If the file system crashes, all remaining transactions in the log must still be performed
• Faster recovery from crash, removes chance of inconsistency of metadata
12/2/2019 CUNY | Brooklyn College 56
Example: WAFL File System
• Used on Network Appliance “Filers” –distributed file system appliances
• “Write-anywhere file layout”
• Serves up NFS, CIFS, http, ftp
• Random I/O optimized, write optimized
• NVRAM for write caching
• Similar to Berkeley Fast File System, with extensive modifications
12/2/2019 CUNY | Brooklyn College 57
The WAFL File Layout
12/2/2019 CUNY | Brooklyn College 58
Snapshots in
WAFL
12/2/2019 CUNY | Brooklyn College 59
Questions?
• Recovery
• Log structured file systems
• WAFL
12/2/2019 CUNY | Brooklyn College 60