FlashVM: Virtual Memory Management on Flash · –target dedicated flash for virtual memory paging...

23
FlashVM: Virtual Memory Management on Flash Mohit Saxena Michael M. Swift University of Wisconsin-Madison USENIX Annual Technical Conference, USA, June 2010

Transcript of FlashVM: Virtual Memory Management on Flash · –target dedicated flash for virtual memory paging...

FlashVM: Virtual Memory Management on Flash

Mohit Saxena Michael M. Swift

University of Wisconsin-Madison

USENIX Annual Technical Conference, USA, June 2010

Outline

• Introduction

• Design Overview

• Design and Implementation

• Evaluation

• Conclusion

Introduction (1/5) • with the decreasing price of flash memory, systems will

increasingly use solid-state storage for virtual-memory paging rather than disks

• FlashVM is a system architecture and a core virtual memory subsystem built in the Linux kernel that uses dedicated flash for paging

• Design Goals:

– High performance

– Reduced flash wear out for improved reliability

– Efficient garbage collection

Introduction (3/5)

Figure: The garbage collection process.

invalid

All valid data is copied to another free block

Erase Block X to reclaim the space

Introduction (4/5)

Figure: Garbage collection without the TRIM command.

Introduction (5/5)

Figure: Garbage collection with the TRIM command.

Design Overview (1/2)

Design Overview (2/2) • Flash Management

– Three problems on flash

• write amplification, low reliability, and aging

– leverages semantic information only available within OS to address these problems

• FlashVM Architecture

– target dedicated flash for virtual memory paging

• is cheaper, because only for small capacities required for virtual memory

• minimizes the interference between the file system I/O and virtual-memory paging traffic

Design and Implementation (1/7) • FlashVM Performance

– Page Write-Back

• Pre-cleaning

– eagerly swapping out dirty page before new pages are needed

– the Linux page-out daemon kswapd runs periodically to write out 32 pages from inactive page list

• Clustering

– to avoid random writes, assigns contiguous ranges of page slots to pages when they are written out

– Page Scanning

• must ensure the rate at which it selects pages for eviction matches the write bandwidth of the swap device

• If scanning rate is too high, throttle page write-backs by waiting for 20-100 ms or until a write completes

– this timeout is too high for flash, FlashVM times-out for about 1 ms

• FlashVM Performance (Cont.)

– Prefetching on Page Fault • The existing Linux prefetch mechanism reads in up to 8 pages contiguous

on disk around the target page

– often fetch fewer than 8 pages

• FlashVM

– contiguous prefetching:

» seeks over the free/bad pages when prefetching to retrieve a full set of valid pages

– stride prefetching:

» records offsets between current target page and the last two faulting addresses

» using the offsets to compute the strides for the next two pages expected to be reference

• FlashVM Performance (Cont.)

– Disk Scheduling

• Existing Linux I/O schedulers optimize performance by – (i) merging adjacent requests

– (ii) reordering requests to minimize seeks and to prioritize requests

– (iii) delaying requests to allow a process to submit new requests

• Work-conserving schedulers – such as NOOP and deadline scheduler, submit pending

requests as soon as the prior request completes

• Non-work-conserving schedulers – may delay requests to wait for new requests with better

locality or to distribute I/O bandwidth fairly between processes

• 1

Design and Implementation (4/7) • FlashVM Reliablity

– Page Sampling

• prioritizing the reclaim of younger clean pages over older dirty pages → reduce the over all number of write to flash device

• the optimal rate for sampling dirty pages is strongly related to the memory reference pattern of the application

• adaptive page sampling – predicts the average write rate by maintaining a moving average of the

time interval tn for writing n dirty pages

– when tn is large, more aggressively skips dirty pages

• FlashVM Reliablity (Cont.)

– Page Sharing

• zero pages (pages that contain only zero bytes) form a significant fraction of the memory-footprint of some application workloads

• FlashVM intercepts paging requests for all zero pages – a swap-out request sets a zero flag in the corresponding page slot in

the swap map, and skip submitting a block I/O request

– a swap-in request verifies the zero flag

» if found set, allocate a zero page in the address space of the application

– saving both the memory allocated for zero pages and the number of page write-backs to the flash device

Design and Implementation (6/7) • FlashVM Garbage Collection

– Discard Cost

• C0 is the fixed cost of discarding up to B0 blocks, 55 ms

• u: average utilization of valid pages in a block

• m: marginal cost of discarding each additional block

• FlashVM Garbage Collection (Cont.) – Merged Discard

• FlashVM batches discards from multiple scans of the swap map

• discard the largest possible range list of free pages up to a size of 100 MB in a single command

– Dummy Discard • discard is only useful when it create free blocks that can later be

used for write

• overwriting a block also causes SSD to discard the old block contents, but without the high fixed costs of discard command

– removes the benefit of discarding to maintain a pool of empty blocks

– decide when to use Merged/Dummy Discard • predicts the rate of allocation by estimating the expected time

interval ts between two successive scans for finding a free page cluster

• when ts is small, FlashVM uses dummy discard, and otherwise applies merged discards

Evaluation (1/6) • run all tests on a

– 2.5GHz Intel Core 2 Quad system

– 4GB DDR2 DRAM

• compare 4 storage devices

• 4 memory-intensive application workloads – ImageMagick 6.3.7

– Spin 5.2

– pseudo-SpecJBB

– memcached 1.4.1

• demand paging the first process into memory incurs over 16,000 page faults

with disk: 11.5 s with flash: 3.5 s

(results are not comparable across devices)

for the remaining tests, • number of pages pre-cleaned = 32 pages • cluster size = 32 pages • use NOOP scheduler

Evaluation (4/6)

percentage reduction speedup

Evaluation (5/6)

• Linux with discard is 12 times slower than the baseline

• FlashVM with merged discard is 15% slower than baseline

• FlashVM with dummy discard is 11% slower than baseline

Evaluation (6/6)

On average, FlashVM reduces run time by 82% and reduces the memory usage by 60% over DiskVM

Conclusion

• FlashVM adapts the Linux virtual memory system for the performance, reliability, and garbage collection characteristics on flash storage

• As new storage technologies with yet different performance characteristics become available, it is important to revisit both operating system and application designs