ffsck: The Fast File System Checker · Evaluation 7 . E2fsck Checking Approach Phase checking tasks...

38
ffsck: The Fast File System Checker + Ao Ma , + Chris Dragga, + Andrea C. Arpaci-Dusseau, + Remzi H. Arpaci-Dusseau 1 * Backup Recovery Systems Division, EMC + University of Wisconsin, Madison *

Transcript of ffsck: The Fast File System Checker · Evaluation 7 . E2fsck Checking Approach Phase checking tasks...

Page 1: ffsck: The Fast File System Checker · Evaluation 7 . E2fsck Checking Approach Phase checking tasks ... Question2: Can ffsck perform consistently as the FS ages? 28 . Checking Time

ffsck: The Fast File System Checker + Ao Ma, + Chris Dragga, + Andrea C. Arpaci-Dusseau, + Remzi H. Arpaci-Dusseau

1

* Backup Recovery Systems Division, EMC

+ University of Wisconsin, Madison

*

Page 2: ffsck: The Fast File System Checker · Evaluation 7 . E2fsck Checking Approach Phase checking tasks ... Question2: Can ffsck perform consistently as the FS ages? 28 . Checking Time

Background

File system • Data integrity is critically important

• Should be robust and reliable

Factors can corrupt FS • Unclean shutdown / system crash

• File system bugs

• Hardware failures

2

Page 3: ffsck: The Fast File System Checker · Evaluation 7 . E2fsck Checking Approach Phase checking tasks ... Question2: Can ffsck perform consistently as the FS ages? 28 . Checking Time

Existing Solutions • Journaling

• Copy-on-Write

• Soft updates

• Debugging

• Bug finding tools

• Checksums

• Scrubbing

Unclean shutdown

System crash

Bugs

Hardware failures

Reduce the probability of faults

but can’t protect against all of them

3

Page 4: ffsck: The Fast File System Checker · Evaluation 7 . E2fsck Checking Approach Phase checking tasks ... Question2: Can ffsck perform consistently as the FS ages? 28 . Checking Time

fsck: Last Resort To Repair

4

Approach • Scan FS offline

• Check metadata redundancy info

• Restore a damaged FS back to usable state

Dilemma • Fsck is slow, causing long downtime

• Unpredictable checking time

Page 5: ffsck: The Fast File System Checker · Evaluation 7 . E2fsck Checking Approach Phase checking tasks ... Question2: Can ffsck perform consistently as the FS ages? 28 . Checking Time

fsck Challenges

File system is evolving

• Capacity is increasing

• More complexity (more bugs and hardware failures)

Impact on fsck

• Longer downtime

• Use more frequently

5

Page 6: ffsck: The Fast File System Checker · Evaluation 7 . E2fsck Checking Approach Phase checking tasks ... Question2: Can ffsck perform consistently as the FS ages? 28 . Checking Time

What we did

Fast file system checker (ffsck)

• 10 times faster with identical checking policy

A modified version of ext3 (rext3)

• 20% improvement in big writes

• 43% improvement in random reads

• 10% degradation in small reads

6

Page 7: ffsck: The Fast File System Checker · Evaluation 7 . E2fsck Checking Approach Phase checking tasks ... Question2: Can ffsck perform consistently as the FS ages? 28 . Checking Time

Outline fsck analysis • Checking approach

• Performance

• FS tradeoffs

ffsck & rext3 • Goals

• Novel features

Evaluation

7

Page 8: ffsck: The Fast File System Checker · Evaluation 7 . E2fsck Checking Approach Phase checking tasks ... Question2: Can ffsck perform consistently as the FS ages? 28 . Checking Time

E2fsck Checking Approach Phase checking tasks

1 Scan inodes and indirect blocks in a logical order

2 Check each directory individually

3 Check directory connectivity

4 Check inode reference count and remove orphan inodes

5 Update the on-disk copies if necessary

8

Page 9: ffsck: The Fast File System Checker · Evaluation 7 . E2fsck Checking Approach Phase checking tasks ... Question2: Can ffsck perform consistently as the FS ages? 28 . Checking Time

Performance Analysis

750GB disk, 1GB memory, e2fsprog 1.41.12, Linux 2.6.28

Initialize disk image by creating directories with small files (4KB- 2MB)

Increase FS size by creating new files and appending data to files(4KB-1MB)

E2fsck doesn’t scale well

Phase 1 dominates the checking time

9

1666

2554

3398

4176

0

1000

2000

3000

4000

5000

150 GB 300 GB 450 GB 600 GB

phase 1 phase 2 phase 3 phase 4 phase 5

Total Checking Time (second)

FS size

Page 10: ffsck: The Fast File System Checker · Evaluation 7 . E2fsck Checking Approach Phase checking tasks ... Question2: Can ffsck perform consistently as the FS ages? 28 . Checking Time

Cumulative Time Spent on Reading Indirect Blocks and Inode Blocks

Indirect blocks are the bottleneck 10

0 50

100 150 200 250 300 350 400 450 500

1 21 41 61 81 101

inode block indirect block

Millisecond

Read Block Number

Page 11: ffsck: The Fast File System Checker · Evaluation 7 . E2fsck Checking Approach Phase checking tasks ... Question2: Can ffsck perform consistently as the FS ages? 28 . Checking Time

File System Design Tradeoffs

11

Identical allocation for data and indirect blocks

• Pros: store them contiguously and facilitate sequential access

• Cons: metadata scatters across disk

Rely on tree structure to locate indirect blocks • Pros: simple and straightforward

• Cons: impose a strict ordering of access

Page 12: ffsck: The Fast File System Checker · Evaluation 7 . E2fsck Checking Approach Phase checking tasks ... Question2: Can ffsck perform consistently as the FS ages? 28 . Checking Time

Fsck: An Afterthought Design

12

Repairing capability is not prioritized

• File system has limited support for checker

• Checker is developed as a peripheral addition, rather

than a tight component

Page 13: ffsck: The Fast File System Checker · Evaluation 7 . E2fsck Checking Approach Phase checking tasks ... Question2: Can ffsck perform consistently as the FS ages? 28 . Checking Time

Outline fsck analysis • Checking approach

• Performance

• FS tradeoffs

ffsck & rext3 • Goals

• Novel features

Evaluation

13

Page 14: ffsck: The Fast File System Checker · Evaluation 7 . E2fsck Checking Approach Phase checking tasks ... Question2: Can ffsck perform consistently as the FS ages? 28 . Checking Time

Ffsck and Rext3

Prioritize fast repair when design FS • Fast scan

• Robust checking performance

• Competitive FS performance

ffsck and rext3 is based on e2fsck and ext3

14

Page 15: ffsck: The Fast File System Checker · Evaluation 7 . E2fsck Checking Approach Phase checking tasks ... Question2: Can ffsck perform consistently as the FS ages? 28 . Checking Time

Basic Layout of Ext3

Block Group 0

Super block

Group Descriptor

Data Bitmap

Inode Bitmap

Inode table

Data blocks

Block Group n Block Group i

15

Page 16: ffsck: The Fast File System Checker · Evaluation 7 . E2fsck Checking Approach Phase checking tasks ... Question2: Can ffsck perform consistently as the FS ages? 28 . Checking Time

Overview of ffsck and rext3

New disk layout

Disk-order scan

Self-check and Cross-check

Fast recovery with bitmap snapshot

16

Page 17: ffsck: The Fast File System Checker · Evaluation 7 . E2fsck Checking Approach Phase checking tasks ... Question2: Can ffsck perform consistently as the FS ages? 28 . Checking Time

Rext3 Disk Layout Decouple allocation

Improve metadata density

Indirect block, directory data block

Super block

Group Descriptor

Data Bitmap

Inode Bitmap

Inode table

Data blocks Indirect region

Inode table Indirect region Data blocks

New allocation example

Ino Ind1

D1 … Ind 2

Da Db

17

Page 18: ffsck: The Fast File System Checker · Evaluation 7 . E2fsck Checking Approach Phase checking tasks ... Question2: Can ffsck perform consistently as the FS ages? 28 . Checking Time

Rext3 Disk Layout

More additional seeks?

Disk track buffer

18

Track buffer

2

3

4

5

6

7

8

1

9

10

11

12

13

14

15 Spindle

Rotates this way

16

Heads Data

16MB

32MB

64MB

128MB

Size keeps increasing

Page 19: ffsck: The Fast File System Checker · Evaluation 7 . E2fsck Checking Approach Phase checking tasks ... Question2: Can ffsck perform consistently as the FS ages? 28 . Checking Time

Disk-order Scan Most efficient way to scan metadata Predictable scanning time

Super block

Group Descriptor

Data Bitmap

Inode Bitmap

Inode table

Indirect region

Data Region

Metadata region

Data Region

Data Region

Metadata region

Metadata region

block group 0 block group 1 block group n

read seek read seek read seek

19

Page 20: ffsck: The Fast File System Checker · Evaluation 7 . E2fsck Checking Approach Phase checking tasks ... Question2: Can ffsck perform consistently as the FS ages? 28 . Checking Time

Memory Pressure

20

Disk-order scan accesses all the indirect

blocks without using the indirect tree

• Can’t perform checking until all the related

metadata are cached

• Impractical for large-scale FS

Page 21: ffsck: The Fast File System Checker · Evaluation 7 . E2fsck Checking Approach Phase checking tasks ... Question2: Can ffsck perform consistently as the FS ages? 28 . Checking Time

Self-check, Cross-check

Separate self-check and cross-check

• Self-ID is added

• Finish most checks without referring to other

metadata (self-check)

Self-check and discard

• Once self-check is performed, remove unused

fields for cross-check

21

Page 22: ffsck: The Fast File System Checker · Evaluation 7 . E2fsck Checking Approach Phase checking tasks ... Question2: Can ffsck perform consistently as the FS ages? 28 . Checking Time

Example: Self-check & Discard

ino

Its own LBA

Blk_num = 84

Last pointer offset = 83

Last Pointer: 94

Compression ratio is nearly 250:1

Indirect block disk copy

Self-check: 1. blk range check 2. bitmap

ino

Blk 11

Blk 12

Blk ….

Blk 94

self-check Indirect block memory copy

22

Page 23: ffsck: The Fast File System Checker · Evaluation 7 . E2fsck Checking Approach Phase checking tasks ... Question2: Can ffsck perform consistently as the FS ages? 28 . Checking Time

Example: Cross-check of File Size Inode

Last pointer offset: 13

Double Indirect

Last pointer offset: 2

Indirect Block

Last pointer offset: 12

Last pointer: 36

Last pointer: 157

Last pointer : 950

LBA: 36

LBA: 157

1. Partially rebuild the tree structure

2. Calculate the file size using offset

23

Page 24: ffsck: The Fast File System Checker · Evaluation 7 . E2fsck Checking Approach Phase checking tasks ... Question2: Can ffsck perform consistently as the FS ages? 28 . Checking Time

Fast Recovery with Bitmap Snapshot

Costly double scan of inode and indirect blocks

• Detect multiple-claimed blocks in 1st scan

• Detect their owners during 2nd scan

ffsck: 1 full scan + 1 partial rescan

• ffsck builds a list of bitmap snapshots to limit the

rescan’s scope

24

Page 25: ffsck: The Fast File System Checker · Evaluation 7 . E2fsck Checking Approach Phase checking tasks ... Question2: Can ffsck perform consistently as the FS ages? 28 . Checking Time

Fast Recovery with Bitmap Snapshot

Create snapshot for each group of inodes

0

0

0

0

0

0

0

1 2 3 4 5 6 7

1

1

0

1

0

0

0

1

1

1

1

1

1

0

1

1

1

1

1

1

1

snapshot1 snapshot2 snapshot3

Only need to rescan

the group of inodes for snapshot1

25

Page 26: ffsck: The Fast File System Checker · Evaluation 7 . E2fsck Checking Approach Phase checking tasks ... Question2: Can ffsck perform consistently as the FS ages? 28 . Checking Time

Summary

Decouple allocation • Improve metadata density

Disk-order scan

• Most efficient scan approach

Self-check & cross-check

• Avoid memory saturation

Fast recovery with bitmap snapshot

• One full scan + partial rescan

26

Page 27: ffsck: The Fast File System Checker · Evaluation 7 . E2fsck Checking Approach Phase checking tasks ... Question2: Can ffsck perform consistently as the FS ages? 28 . Checking Time

Outline fsck analysis • Checking approach

• Performance

• FS tradeoffs

ffsck & rext3 • Goals

• Novel features

Evaluation

27

Page 28: ffsck: The Fast File System Checker · Evaluation 7 . E2fsck Checking Approach Phase checking tasks ... Question2: Can ffsck perform consistently as the FS ages? 28 . Checking Time

Checking Performance Comparison

Question1:

Will ffsck scale well?

Question2:

Can ffsck perform consistently as the FS ages?

28

Page 29: ffsck: The Fast File System Checker · Evaluation 7 . E2fsck Checking Approach Phase checking tasks ... Question2: Can ffsck perform consistently as the FS ages? 28 . Checking Time

Checking Time Comparison

1666

2554

3398

4176

462 464 468 471

0

1000

2000

3000

4000

5000

150GB 300GB 450GB 600GB

e2fsck ffsck

Time (seconds)

FS size

29

750GB disk, 1GB memory, e2fsprog 1.41.12, Linux 2.6.28

Initialize disk image by creating directories with small files (4KB- 2MB)

Increase FS size by creating new files and appending data to files(4KB-1MB)

Ffsck checking time is determined

when the file system is created

Page 30: ffsck: The Fast File System Checker · Evaluation 7 . E2fsck Checking Approach Phase checking tasks ... Question2: Can ffsck perform consistently as the FS ages? 28 . Checking Time

Checking Speed Comparison on Aging FS Image

MB/s

Operations/Group

30

Aging FS image by performing file

creations, appends, truncations and deletions

(750GB partition, roughly 95% utilization)

6.23 5.87 5.43 5.29 5.16

61.89 62.31 61.17 60.94 60.91

102

0

20

40

60

80

100

120

0 250 500 750 1000

e2fsck on ext3 ffsck on rext3 optimal disk bandwidth

Much faster and robust

Page 31: ffsck: The Fast File System Checker · Evaluation 7 . E2fsck Checking Approach Phase checking tasks ... Question2: Can ffsck perform consistently as the FS ages? 28 . Checking Time

File System Comparison

Question1:

Can rext3 compete with ext3 in sequential reads?

Question2:

What is its impact on sequential writes?

Question3 and 4:

What about random reads and macro-benchmark?

31

Page 32: ffsck: The Fast File System Checker · Evaluation 7 . E2fsck Checking Approach Phase checking tasks ... Question2: Can ffsck perform consistently as the FS ages? 28 . Checking Time

Sequential Read

0.64 6.98

51.9

100

121 121

0.64 6.21

51.1

99.3

119 121

0

20

40

60

80

100

120

140

10KB 100KB 1MB 10MB 100MB 1GB

ext3 rext3

File size

Throughput (MB/sec)

32

The disk track buffer allows rext3 to match ext3’s

performance, except for small reads

8.4% penalty

Page 33: ffsck: The Fast File System Checker · Evaluation 7 . E2fsck Checking Approach Phase checking tasks ... Question2: Can ffsck perform consistently as the FS ages? 28 . Checking Time

Sequential Write

4.62

23.4

59.1 70.5

81 87.6

4.58

24.2

59.7 69.4

88.5

104

0

20

40

60

80

100

120

12KB 100KB 1MB 10MB 100MB 1GB

ext3 rext3

Throughput (MB/S)

File Size

33

Indirect region aids ext3’s ordered journaling mechanism

9.3% + 19% +

Page 34: ffsck: The Fast File System Checker · Evaluation 7 . E2fsck Checking Approach Phase checking tasks ... Question2: Can ffsck perform consistently as the FS ages? 28 . Checking Time

Random Read

27% - 43% improvement

Indirect region benefits from disk buffer

0.329 0.352 0.398

0.47 0.48 0.505

0

0.1

0.2

0.3

0.4

0.5

0.6

128 256 512

ext3 rext3 Throughput (MB/S)

Read number

34

Randomly read 4KB blocks from a 2GB file

Page 35: ffsck: The Fast File System Checker · Evaluation 7 . E2fsck Checking Approach Phase checking tasks ... Question2: Can ffsck perform consistently as the FS ages? 28 . Checking Time

Postmark

76 165

338

719

76 160

341

710

0

500

1000

1000 2000 4000 8000

ext3 rext3

Filebench

Time (Seconds)

Transaction number

2.5

3.43

0.57

2.4

3.23

0.57

0

2

4

File Server Web Server Varmail

ext3 rext3

35 Competitive performance

Page 36: ffsck: The Fast File System Checker · Evaluation 7 . E2fsck Checking Approach Phase checking tasks ... Question2: Can ffsck perform consistently as the FS ages? 28 . Checking Time

Summary

Make fast repair a primary concern of FS design

FS provides direct support for the fast checker

Benefits:

• 10 times checking speed

• Big improvement for large writes and random reads

• Small penalty for small reads

36

Page 37: ffsck: The Fast File System Checker · Evaluation 7 . E2fsck Checking Approach Phase checking tasks ... Question2: Can ffsck perform consistently as the FS ages? 28 . Checking Time

Conclusion

• How to protect against corruptions is well-known

• FS repairing is important but receives little attention

• Build the checker as an integral component rather than

a peripheral addition

• ffsck is not a universal solution, other FSes may

require other methods

37

Page 38: ffsck: The Fast File System Checker · Evaluation 7 . E2fsck Checking Approach Phase checking tasks ... Question2: Can ffsck perform consistently as the FS ages? 28 . Checking Time

Thanks!

Questions?

38

Wisconsin Institute on Software-defined Datacenters in Madison http://wisdom.cs.wisc.edu/