Local Filesystems (part 1) CPS210 Spring 2006. Papers The Design and Implementation of a Log-...
-
Upload
linda-blair -
Category
Documents
-
view
220 -
download
0
Transcript of Local Filesystems (part 1) CPS210 Spring 2006. Papers The Design and Implementation of a Log-...
Local Filesystems (part 1)
CPS210Spring 2006
Papers
The Design and Implementation of a Log-Structured File System Mendel Rosenblum
File System Logging Versus Clustering: A Performance Comparison Margo Seltzer
Surface organized into tracks
Parallel tracks form cylinders
Tracks broken up into sectors
Disk head position
Rotation is counter-clockwise
About to read a sector
After reading blue sector
After BLUE read
Red request scheduled next
After BLUE read
Seek to red’s track
After BLUE read Seek for RED
SEEK
Wait for red sector to reach head
After BLUE read Seek for RED Rotational latency
ROTATESEEK
Read red sector
After BLUE read Seek for RED Rotational latency After RED read
ROTATESEEK
Unix index blocks Intuition
Many files are small Length = 0, length = 1, length < 80, ...
Some files are huge (3 gigabytes) “Clever heuristic” in Unix FFS inode
12 (direct) block pointers: 12 * 8 KB = 96 KB Availability is “free” - you need inode to open() file
anyway 3 indirect block pointers
single, double, triple
Unix index blocks
106105
501502
102101
1615
1817
500100
1000 104103
2019
2221
2423
2625
2827
3029
3231
Unix index blocks
1615
1817
-1-1
-1
Direct blocks
Indirect pointerDouble-indirect
Triple-indirect
Unix index blocks
1615
1817
-1100
-1
2019
Unix index blocks
102101
1615
1817
500100
-1
2019
2221
2423
Unix index blocks
106105
501502
102101
1615
1817
500100
1000 104103
2019
2221
2423
2625
2827
3029
3231
Log-structured file system What is the high level motivation?
Caches are getting bigger Disk reads are less important Disk traffic will be dominated by writes
Why a log? Eliminate seeks (make all disk writes sequential) Easy crash-recovery
Most writes are small meta-data updates Consecutive small writes will trigger seeks Some file systems perform these synchronously
LFS challenges
Writes are easier, what about reads?
How do we ensure large open spaces? Why does this matter?
LFS on disk structures
Segment cleaner LFS requires large open spaces
Fragmentation will kill performance Use notion of segments
Large contiguous areas of live/dead data 512 kb or 1MB
Segment cleaner defragments disk Separate the old from the young Old data rarely changes Clean two differently
Threading vs copying
Thread between segments
Segment cleaning
Three steps Read segments into memory Identify live blocks in those segments Write live blocks to a small, clean
segment Must also update file inodes
Segment block summary for each segment
Segment block summaries Contains info about blocks in a
segment For each file data block
SBS has the file number and block number Also used to identify live/dead blocks
Use file number from SBS with actual inode If same, block is live If different, block is dead
Optimization: just keep inode versions
Cleaner policy questions
When should the cleaner run? Continuously, at night, high utilization
How many segments per cleaning? Tens, hundreds, …
Which segments should be cleaned?
How should live blocks be grouped?
Write cost
Percent of bandwidth used for new data 1.0 is perfect (we only write new data) 10.0 isn’t great (1/10 written data is new)
Ideal bimodal distribution of segments High utilization of old segments Low utilization of young segments Combines high utilization and low write cost
Initial simulation results
Surprise!
Why the surprise?
These segments decay slowly collecting dead blocks.In aggregate, they contain a lot of free blocks
Instead of greatest yield
Use cost-benefit analysis Benefit/cost =
(Yield * age)/cost = (1-u)*age/(1+u)
Result Clean cold segments at higher
utilization
Result
What do LFS results really mean?
Workloads matter When is LFS better than FFS? When is FFS better than LFS?
More terminology
Blocks Partial blocks, aka fragments Contiguous ranges of blocks, aka
clusters Want to allocate indoes + data
In same cylinder (no seeks) Want data
In clusters on same track (also no seeks)
Challenges for FFS
Reduce (eliminate?) synchronous writes
Avoid fragmentation Why are meta-data writes
synchronous?
Sequential performance vs file size
Four phase benchmark: Create Read Overwrite Delete
Ideal conditions Blank file system, no cleaner
Create results
Read results
Overwrite results
Delete results