Post on 13-Jul-2020
ffsck: The Fast File System Checker + Ao Ma, + Chris Dragga, + Andrea C. Arpaci-Dusseau, + Remzi H. Arpaci-Dusseau
1
* Backup Recovery Systems Division, EMC
+ University of Wisconsin, Madison
*
Background
File system • Data integrity is critically important
• Should be robust and reliable
Factors can corrupt FS • Unclean shutdown / system crash
• File system bugs
• Hardware failures
2
Existing Solutions • Journaling
• Copy-on-Write
• Soft updates
• Debugging
• Bug finding tools
• Checksums
• Scrubbing
Unclean shutdown
System crash
Bugs
Hardware failures
Reduce the probability of faults
but can’t protect against all of them
3
fsck: Last Resort To Repair
4
Approach • Scan FS offline
• Check metadata redundancy info
• Restore a damaged FS back to usable state
Dilemma • Fsck is slow, causing long downtime
• Unpredictable checking time
fsck Challenges
File system is evolving
• Capacity is increasing
• More complexity (more bugs and hardware failures)
Impact on fsck
• Longer downtime
• Use more frequently
5
What we did
Fast file system checker (ffsck)
• 10 times faster with identical checking policy
A modified version of ext3 (rext3)
• 20% improvement in big writes
• 43% improvement in random reads
• 10% degradation in small reads
6
Outline fsck analysis • Checking approach
• Performance
• FS tradeoffs
ffsck & rext3 • Goals
• Novel features
Evaluation
7
E2fsck Checking Approach Phase checking tasks
1 Scan inodes and indirect blocks in a logical order
2 Check each directory individually
3 Check directory connectivity
4 Check inode reference count and remove orphan inodes
5 Update the on-disk copies if necessary
8
Performance Analysis
750GB disk, 1GB memory, e2fsprog 1.41.12, Linux 2.6.28
Initialize disk image by creating directories with small files (4KB- 2MB)
Increase FS size by creating new files and appending data to files(4KB-1MB)
E2fsck doesn’t scale well
Phase 1 dominates the checking time
9
1666
2554
3398
4176
0
1000
2000
3000
4000
5000
150 GB 300 GB 450 GB 600 GB
phase 1 phase 2 phase 3 phase 4 phase 5
Total Checking Time (second)
FS size
Cumulative Time Spent on Reading Indirect Blocks and Inode Blocks
Indirect blocks are the bottleneck 10
0 50
100 150 200 250 300 350 400 450 500
1 21 41 61 81 101
inode block indirect block
Millisecond
Read Block Number
File System Design Tradeoffs
11
Identical allocation for data and indirect blocks
• Pros: store them contiguously and facilitate sequential access
• Cons: metadata scatters across disk
Rely on tree structure to locate indirect blocks • Pros: simple and straightforward
• Cons: impose a strict ordering of access
Fsck: An Afterthought Design
12
Repairing capability is not prioritized
• File system has limited support for checker
• Checker is developed as a peripheral addition, rather
than a tight component
Outline fsck analysis • Checking approach
• Performance
• FS tradeoffs
ffsck & rext3 • Goals
• Novel features
Evaluation
13
Ffsck and Rext3
Prioritize fast repair when design FS • Fast scan
• Robust checking performance
• Competitive FS performance
ffsck and rext3 is based on e2fsck and ext3
14
Basic Layout of Ext3
Block Group 0
Super block
Group Descriptor
Data Bitmap
Inode Bitmap
Inode table
Data blocks
Block Group n Block Group i
15
Overview of ffsck and rext3
New disk layout
Disk-order scan
Self-check and Cross-check
Fast recovery with bitmap snapshot
16
Rext3 Disk Layout Decouple allocation
Improve metadata density
Indirect block, directory data block
Super block
Group Descriptor
Data Bitmap
Inode Bitmap
Inode table
Data blocks Indirect region
Inode table Indirect region Data blocks
New allocation example
Ino Ind1
D1 … Ind 2
Da Db
17
Rext3 Disk Layout
More additional seeks?
Disk track buffer
18
Track buffer
2
3
4
5
6
7
8
1
9
10
11
12
13
14
15 Spindle
Rotates this way
16
Heads Data
16MB
32MB
64MB
128MB
Size keeps increasing
Disk-order Scan Most efficient way to scan metadata Predictable scanning time
Super block
Group Descriptor
Data Bitmap
Inode Bitmap
Inode table
Indirect region
Data Region
Metadata region
Data Region
Data Region
Metadata region
Metadata region
block group 0 block group 1 block group n
read seek read seek read seek
19
Memory Pressure
20
Disk-order scan accesses all the indirect
blocks without using the indirect tree
• Can’t perform checking until all the related
metadata are cached
• Impractical for large-scale FS
Self-check, Cross-check
Separate self-check and cross-check
• Self-ID is added
• Finish most checks without referring to other
metadata (self-check)
Self-check and discard
• Once self-check is performed, remove unused
fields for cross-check
21
Example: Self-check & Discard
ino
Its own LBA
Blk_num = 84
Last pointer offset = 83
Last Pointer: 94
Compression ratio is nearly 250:1
Indirect block disk copy
Self-check: 1. blk range check 2. bitmap
ino
Blk 11
Blk 12
Blk ….
Blk 94
self-check Indirect block memory copy
22
Example: Cross-check of File Size Inode
Last pointer offset: 13
Double Indirect
Last pointer offset: 2
Indirect Block
Last pointer offset: 12
Last pointer: 36
Last pointer: 157
Last pointer : 950
LBA: 36
LBA: 157
1. Partially rebuild the tree structure
2. Calculate the file size using offset
23
Fast Recovery with Bitmap Snapshot
Costly double scan of inode and indirect blocks
• Detect multiple-claimed blocks in 1st scan
• Detect their owners during 2nd scan
ffsck: 1 full scan + 1 partial rescan
• ffsck builds a list of bitmap snapshots to limit the
rescan’s scope
24
Fast Recovery with Bitmap Snapshot
Create snapshot for each group of inodes
0
0
0
0
0
0
0
1 2 3 4 5 6 7
1
1
0
1
0
0
0
1
1
1
1
1
1
0
1
1
1
1
1
1
1
snapshot1 snapshot2 snapshot3
Only need to rescan
the group of inodes for snapshot1
25
Summary
Decouple allocation • Improve metadata density
Disk-order scan
• Most efficient scan approach
Self-check & cross-check
• Avoid memory saturation
Fast recovery with bitmap snapshot
• One full scan + partial rescan
26
Outline fsck analysis • Checking approach
• Performance
• FS tradeoffs
ffsck & rext3 • Goals
• Novel features
Evaluation
27
Checking Performance Comparison
Question1:
Will ffsck scale well?
Question2:
Can ffsck perform consistently as the FS ages?
28
Checking Time Comparison
1666
2554
3398
4176
462 464 468 471
0
1000
2000
3000
4000
5000
150GB 300GB 450GB 600GB
e2fsck ffsck
Time (seconds)
FS size
29
750GB disk, 1GB memory, e2fsprog 1.41.12, Linux 2.6.28
Initialize disk image by creating directories with small files (4KB- 2MB)
Increase FS size by creating new files and appending data to files(4KB-1MB)
Ffsck checking time is determined
when the file system is created
Checking Speed Comparison on Aging FS Image
MB/s
Operations/Group
30
Aging FS image by performing file
creations, appends, truncations and deletions
(750GB partition, roughly 95% utilization)
6.23 5.87 5.43 5.29 5.16
61.89 62.31 61.17 60.94 60.91
102
0
20
40
60
80
100
120
0 250 500 750 1000
e2fsck on ext3 ffsck on rext3 optimal disk bandwidth
Much faster and robust
File System Comparison
Question1:
Can rext3 compete with ext3 in sequential reads?
Question2:
What is its impact on sequential writes?
Question3 and 4:
What about random reads and macro-benchmark?
31
Sequential Read
0.64 6.98
51.9
100
121 121
0.64 6.21
51.1
99.3
119 121
0
20
40
60
80
100
120
140
10KB 100KB 1MB 10MB 100MB 1GB
ext3 rext3
File size
Throughput (MB/sec)
32
The disk track buffer allows rext3 to match ext3’s
performance, except for small reads
8.4% penalty
Sequential Write
4.62
23.4
59.1 70.5
81 87.6
4.58
24.2
59.7 69.4
88.5
104
0
20
40
60
80
100
120
12KB 100KB 1MB 10MB 100MB 1GB
ext3 rext3
Throughput (MB/S)
File Size
33
Indirect region aids ext3’s ordered journaling mechanism
9.3% + 19% +
Random Read
27% - 43% improvement
Indirect region benefits from disk buffer
0.329 0.352 0.398
0.47 0.48 0.505
0
0.1
0.2
0.3
0.4
0.5
0.6
128 256 512
ext3 rext3 Throughput (MB/S)
Read number
34
Randomly read 4KB blocks from a 2GB file
Postmark
76 165
338
719
76 160
341
710
0
500
1000
1000 2000 4000 8000
ext3 rext3
Filebench
Time (Seconds)
Transaction number
2.5
3.43
0.57
2.4
3.23
0.57
0
2
4
File Server Web Server Varmail
ext3 rext3
35 Competitive performance
Summary
Make fast repair a primary concern of FS design
FS provides direct support for the fast checker
Benefits:
• 10 times checking speed
• Big improvement for large writes and random reads
• Small penalty for small reads
36
Conclusion
• How to protect against corruptions is well-known
• FS repairing is important but receives little attention
• Build the checker as an integral component rather than
a peripheral addition
• ffsck is not a universal solution, other FSes may
require other methods
37
Thanks!
Questions?
38
Wisconsin Institute on Software-defined Datacenters in Madison http://wisdom.cs.wisc.edu/