Journal-guided Resynchronization for Software RAID

28
Journal-guided Resynchronization for Software RAID Timothy E. Denehy, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau University of Wisconsin, Madison

description

Journal-guided Resynchronization for Software RAID. Timothy E. Denehy, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau University of Wisconsin, Madison. RAID Consistent Update Problem. RAID task is to maintain consistency Challenging in the face of crashes - PowerPoint PPT Presentation

Transcript of Journal-guided Resynchronization for Software RAID

Page 1: Journal-guided Resynchronization for Software RAID

Journal-guided Resynchronizationfor Software RAID

Timothy E. Denehy,Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau

University of Wisconsin, Madison

Page 2: Journal-guided Resynchronization for Software RAID

RAID Consistent Update Problem

• RAID task is to maintain consistency

• Challenging in the face of crashes– Updates must be applied to more than one disk

• Inconsistency means window of vulnerability– Disk failure may lead to data loss

P P P PP P P P

P P P PP P P P

Page 3: Journal-guided Resynchronization for Software RAID

High-end RAID Solution

• Consistent update with non-volatile memory– Logs writes in NVRAM until they reach disk

• Performance – logging to NVRAM is fast

• Reliability – data is safe in NVRAM

• Availability – recovery is fast

• But, enterprise systems are expensive

Page 4: Journal-guided Resynchronization for Software RAID

Software RAID Solutions• Consistent update is challenging

– Performance versus reliability trade-off

• Performance: resynchronization after crash– Scan entire volume to fix inconsistencies– Extremely slow, hours for 100s of GBs to days for TBs– Reliability: lengthens window of vulnerability– Availability: consumes array bandwidth

• Reliability: log intentions to a bitmap– Performance: extra writes to maintain bitmap

Page 5: Journal-guided Resynchronization for Software RAID

Cooperative Software RAID Solution

• Journaling file systems perform logging– Maintain file system data structure consistency– ext3, ReiserFS, JFS, NTFS

• Journal-guided resynchronization– New ext3 mode: declared mode– New software RAID interface: verify read– Achieves performance, reliability, availability

Page 6: Journal-guided Resynchronization for Software RAID

Journal-guided Resync Overview

• Crash: What writes were outstanding?– Narrow the range of possible inconsistencies– Obtain information from journal (declared mode)

• Restart: journal-guided resynchronization– Use journal to identify outstanding writes– Communicate locations to RAID (verify read)– Check redundancy and repair inconsistencies– Greatly reduce the time for resynchronization

Page 7: Journal-guided Resynchronization for Software RAID

Outline

• Problem

• ext3 Background and Analysis

• ext3 Declared Mode and RAID Verify Read

• Journal-guided Resynchronization

• Evaluation

• Conclusion

Page 8: Journal-guided Resynchronization for Software RAID

ext3 Modes

• Data-journaling mode– All data and metadata is written to the journal

• Ordered mode (default)– Only metadata is written to the journal– Strict ordering between data and metadata

• Writeback mode– Only metadata is written to the journal– No ordering between data and metadata

Page 9: Journal-guided Resynchronization for Software RAID

ext3 Transactions

• Updates are grouped into transactions

• Transaction states– Running – collect updates in memory– Commit – write updates to journal– Checkpoint – write updates to home locations

Page 10: Journal-guided Resynchronization for Software RAID

ext3 Journal Structures

• Journal superblock– Head and tail pointers into journal file– Transaction sequence number

• Descriptor block– List of home locations for upcoming blocks

• Commit block– Marks the end of a transaction

Page 11: Journal-guided Resynchronization for Software RAID

Data-journaling Write Analysis

Jou

rnal

P P P PP P P P

P P P PP P P P

Su

per

METADATA

DATA

Running

DATA DATA

Running: collect file system updates in memoryCommit: write desc, meta, data to journal, wait (bounded) write commit to journal, wait (bounded)

CommittingCheckpoint: write journaled blocks to home, wait (known) update superblock (known)

DESC11

METADATA

DATA DATA DATACOMM

11

Checkpointing

Page 12: Journal-guided Resynchronization for Software RAID

Data-journaling Summary

• Provides a record of all outstanding writes– Suitable for journal-guided resynchronization

• Offers poor performance

Block Type Write Location

superblock known, fixed

journal bounded, fixed

home metadata known, descriptors

home data known, descriptors

Page 13: Journal-guided Resynchronization for Software RAID

Ordered Write Analysis

Jou

rnal

P P P PP P P P

P P P PP P P P

Su

per

METADATA

DATA

Running

DATA DATA

Running: collect file system updates in memory pdflush may write data to home (unknown)

Commit: write data to home, wait (unknown) write desc and meta to journal, wait (bounded) write commit to journal, wait (bounded)

Committing

DESC11

METADATA

COMM11

Page 14: Journal-guided Resynchronization for Software RAID

• Does not provide outstanding write record– Unsuitable for journal-guided resynchronization

Ordered SummaryBlock Type Write Location

superblock known, fixed

journal bounded, fixed

home metadata known, descriptors

home data unknown

Page 15: Journal-guided Resynchronization for Software RAID

Outline

• Problem

• ext3 Background and Analysis

• ext3 Declared Mode and RAID Verify Read

• Journal-guided Resynchronization

• Evaluation

• Conclusion

Page 16: Journal-guided Resynchronization for Software RAID

Declared Mode

• Variation of ordered mode– Only metadata is journaled, strict ordering

• Declares its intent to write to home locations

• New journal structure: declare block– List of home data locations for the transaction

• Space and performance overheads

Page 17: Journal-guided Resynchronization for Software RAID

Declared Write Analysis

Jou

rnal

P P PPP P P P

P P P PP P P P

Su

per

METADATA

DATA

Running

DATA DATA

Running: collect file system updates in memory pdflush may write data to home (unknown)

Commit: write declare to journal, wait (bounded) write data to home, wait (known) write desc and meta to journal, wait (bounded) write commit to journal, wait (bounded)

Committing

DESC11

METADATA

COMM11

DECL11

Page 18: Journal-guided Resynchronization for Software RAID

Software RAID Verify Read

• File system must communicate possible inconsistencies to the software RAID layer

• New interface: verify read request– Read block and verify its redundant information– Repair redundant information if inconsistent

P P P PP P P P

P P P PP P P P

P= ?xorxor

Page 19: Journal-guided Resynchronization for Software RAID

Outline

• Problem

• ext3 Background and Analysis

• ext3 Declared Mode and RAID Verify Read

• Journal-guided Resynchronization

• Evaluation

• Conclusion

Page 20: Journal-guided Resynchronization for Software RAID

Journal-guided Resynchronization

Jou

rnal

DECL12

P P PPP P P P

P P P PP P P P

Su

per

Recovery and Resynchronization: superblock write: verify read for superblock checkpointing: verify reads for descriptor home locations committing: verify reads for head of the journal home data writes: verify reads for declared home locations checkpoint committed transactions

DESC11

METADATA

COMM11

DECL11

Page 21: Journal-guided Resynchronization for Software RAID

Outline

• Problem

• ext3 Background and Analysis

• ext3 Declared Mode and RAID Verify Read

• Journal-guided Resynchronization

• Evaluation

• Conclusion

Page 22: Journal-guided Resynchronization for Software RAID

Declared Mode Evaluation

• Microbenchmarks (versus ordered mode)– Random write (3% slowdown)– Sequential write (5% slowdown)– Sprite create, read, unlink (4% slowdown)

• Macrobenchmarks– ssh Benchmark (3% speedup for unpack)– Postmark (40% speedup - 5% slowdown)

• Speedup from globally sorted write order

– TPC-B (20% - 5% slowdown)• Small transaction size increases declare overhead

Page 23: Journal-guided Resynchronization for Software RAID

Implementation Complexity

• Cooperative approach reduces complexity

Journal-guided Resynchronization

ModuleOriginal

LinesModified

LinesChange

Software RAID-5 3475 18 0.5 %

ext3 8621 69 0.8 %

Journaling 3472 308 8.9 %

Total 15568 395 2.5 %

Linux RAID-1 Intent Bitmap Logging

Software RAID-1 3116 1193 38.3 %

Page 24: Journal-guided Resynchronization for Software RAID

Resynchronization Experiment

• Five disk, 1 GB RAID-5 array

• Foreground process reading a set of files

• After 30 seconds, crash and restart machine– Resynchronization begins– Foreground process restarts

• Monitor foreground bandwidth and resync

Page 25: Journal-guided Resynchronization for Software RAID

Resynchronization Results

• Availability: foreground BW from 29.6 to 34.1 MB/s• Reliability: vulnerability from 254 to 0.21 seconds

– Reduced from O(array size) to O(journal size)

Page 26: Journal-guided Resynchronization for Software RAID

Outline

• Problem

• ext3 Background and Analysis

• ext3 Declared Mode and RAID Verify Read

• Journal-guided Resynchronization

• Evaluation

• Conclusion

Page 27: Journal-guided Resynchronization for Software RAID

Conclusion

• RAID consistent updates are challenging

• Analyzed ext3 journaling, declared mode– Identifies outstanding writes after a crash

• Software RAID verify read interface

• Journal-guided Resynchronization– Leverages functionality, reducing complexity– Provides performance, reliability, and availability

• Cooperation between layers is the key

Page 28: Journal-guided Resynchronization for Software RAID

Questions?

http://www.cs.wisc.edu/adsl/