Managing Caching for I/O Jeff Chase Duke University.
-
Upload
joel-small -
Category
Documents
-
view
223 -
download
0
Transcript of Managing Caching for I/O Jeff Chase Duke University.
![Page 1: Managing Caching for I/O Jeff Chase Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062423/56649f265503460f94c3dd63/html5/thumbnails/1.jpg)
Managing Caching for I/O
Jeff ChaseDuke University
![Page 2: Managing Caching for I/O Jeff Chase Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062423/56649f265503460f94c3dd63/html5/thumbnails/2.jpg)
Memory as a cache
memory(frames)
data
data
virtual address spaces
files and filesystems,databases,
other storage objects
disk and other storagenetwork RAM
page/block read/write accesses
backing storage volumes(pages and blocks)
Processes access external storage objects through file
APIs and VM abstraction. The OS kernel manages caching
of pages/blocks in main memory.
![Page 3: Managing Caching for I/O Jeff Chase Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062423/56649f265503460f94c3dd63/html5/thumbnails/3.jpg)
DBufferCache DBuffer
read(), write()startFetch(), startPush()waitValid(), waitClean()
DBuffer dbuf = getBlock(blockID)releaseBlock(dbuf)
The DeFiler buffer cache
Device I/O interfaceAsynchronous I/O to/from buffersblock read and writeBlocks numbered by blockIDs
File abstraction implemented in upper DFS layer.All knowledge of how files are laid out on disk is at this layer.Access underlying disk volume through buffer cache API.Obtain buffers (dbufs), write/read to/from buffers, orchestrate I/O.
![Page 4: Managing Caching for I/O Jeff Chase Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062423/56649f265503460f94c3dd63/html5/thumbnails/4.jpg)
read(), write()startFetch(), startPush()waitValid(), waitClean()
DBuffer dbuf = getBlock(blockID)releaseBlock(dbuf)sync()
create, destroy, read, write a dfilelist dfiles
Managing files
DBufferCache DBuffer
“inode”for DFileID
1. Fetch blocks for data and metadata (or zero new ones fresh) into cache buffers (dbufs).
2. Copy bytes to/from dbufs with read and write.
3. Track which data/metadata blocks are valid, and which valid blocks are clean and which are dirty.
4. Clean the dirty blocks by writing them back to the disk with push.
![Page 5: Managing Caching for I/O Jeff Chase Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062423/56649f265503460f94c3dd63/html5/thumbnails/5.jpg)
Page/block cache internals
HASH(blockID)
cache directory
List(s) of free buffers (bufs) or eviction candidates. These dbufs might be listed in the cache directory if they contain useful data, or not, if they are truly free.
To replace a dbufRemove from free/eviction list.Remove from cache directory.Change dbuf blockID and status.Enter in directory w/ new blockID.Re-register on eviction list.Beware of concurrent accesses.
![Page 6: Managing Caching for I/O Jeff Chase Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062423/56649f265503460f94c3dd63/html5/thumbnails/6.jpg)
Anatomy of a read
1. Compute(user mode)
2. Enter kernel for read syscall.
3. getBlock for maps, traverse cached maps,
getBlock for data, and start fetch.
seek transfer
4. sleep for I/O (stall)
5. Copy data to user buffer in read.
CPU
Disk
6. Return to user program.
Time
![Page 7: Managing Caching for I/O Jeff Chase Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062423/56649f265503460f94c3dd63/html5/thumbnails/7.jpg)
Prefetching for high read throughput
• Read-ahead (prefetching)– Fetch blocks into the cache in expectation that they will be used.
– Requires prediction. Common for sequential access.
1. Detect access pattern.
2. Start prefetchingReduce I/O stalls
![Page 8: Managing Caching for I/O Jeff Chase Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062423/56649f265503460f94c3dd63/html5/thumbnails/8.jpg)
Sequential read-ahead
n n+1
App requests block nApp requests block n+1
n+2
System prefetches block n+2
System prefetches block n+3
• Prediction is easy for sequential access.
• Read-ahead also helps reduce seeks by reading larger chunks if data is laid out sequentially on disk.
![Page 9: Managing Caching for I/O Jeff Chase Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062423/56649f265503460f94c3dd63/html5/thumbnails/9.jpg)
Page/block Caching Policy
Each thread/process/job utters a stream of page/block references.– reference string: e.g., abcabcdabce..
The OS tries to minimize the number of fetches/faults.– Try to arrange for the resident set of blocks to match the set of
blocks that will be needed in the near future.
Replacement policy is the key.– On each access, select a victim block to evict from memory;
read the new block into the victim’s frame/dbuf.
– Simple: replace the page whose next reference is furthest in the future (OPT). It is provably optimal.
![Page 10: Managing Caching for I/O Jeff Chase Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062423/56649f265503460f94c3dd63/html5/thumbnails/10.jpg)
Selecting a victim: policy
• The oldest block? (FIFO policy)
• The coldest block? (Least Recently Used)
• The hottest block? (Most Recently Used)?
• The least popular block? (Least Frequently Used)
• A random block?
• A block that has not been used recently?
X Y Z A Z B C D E Z A B C D E
![Page 11: Managing Caching for I/O Jeff Chase Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062423/56649f265503460f94c3dd63/html5/thumbnails/11.jpg)
Selecting a victim: policy
• The oldest block? (FIFO policy)– X Y Z A (evict X) Z (evict Y) B (evict Z) C Z…
• The coldest block? (Least Recently Used)– X Y Z A (evict X) Z (evict Y) B (evict A) C Z…
• The hottest block? (Most Recently Used)?– Consider: A B C D E A B C D E …
• The least popular block? (Least Frequently Used)
• A random block?
• A block that has not been used recently?
![Page 12: Managing Caching for I/O Jeff Chase Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062423/56649f265503460f94c3dd63/html5/thumbnails/12.jpg)
Replacement policy: file systems
• File systems often use a variant of LRU.– A file system sees every block access (through syscall API), so it
can do full LRU: move block to tail of LRU list on each access.
• Sequential files have a cache wiping effect with LRU.– Most file systems detect sequential access and prefer eviction of
blocks from the same file, e.g., using MRU.
– That also prevents any one file/object from consuming more than its “fair share” of the cache.
![Page 13: Managing Caching for I/O Jeff Chase Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062423/56649f265503460f94c3dd63/html5/thumbnails/13.jpg)
VM systems
• VM memory management is similar to file systems.– Page caching in physical memory frames
– Unified with file block caching in most systems
– Virtual address space is a collection of regions/segments, which may be considered as “objects” similar to files.
• Only it’s different.– Mapped by page tables
– VM system software does not see most references, since they are accelerated by Memory Management Unit hardware.
– Requires a sampling approximation for page replacement.
– All data goes away on a system failure: no write atomicity.
![Page 14: Managing Caching for I/O Jeff Chase Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062423/56649f265503460f94c3dd63/html5/thumbnails/14.jpg)
VM Page Tables: Cartoon View
PFN 0PFN 1
PFN i
page #i offset
user virtual address
PFN i+
offset
process page table (map)
physical memorypage frames
In this example, each VPN j maps to PFN j, but in practice any physical frame may be used for
any virtual page.
Each process/VAS has its own page table.
Virtual addresses are translated relative to
the current page table.
The maps are themselves stored in memory; a protected
register holds a pointer to the current map.
![Page 15: Managing Caching for I/O Jeff Chase Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062423/56649f265503460f94c3dd63/html5/thumbnails/15.jpg)
Example: Windows/IA32
• Each address space has a page directory
• One page: 4K bytes, 1024 4-byte entries (PTEs)
• Each PDIR entry points to a “page table”
• Each “page table” is one page with 1024 PTEs
• each PTE maps one 4K page of the address space
• Each page table maps 4MB of memory: 1024*4K
• One PDIR for a 4GB address space, max 4MB of tables
• Load PDIR base address into a register to activate the VAS
![Page 16: Managing Caching for I/O Jeff Chase Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062423/56649f265503460f94c3dd63/html5/thumbnails/16.jpg)
Top-level page table
[from Tanenbaum]
32 bit address with 2 page table fields
Two-level page tables
![Page 17: Managing Caching for I/O Jeff Chase Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062423/56649f265503460f94c3dd63/html5/thumbnails/17.jpg)
Virtual Address Translation
VPN offset12
Example: typical 32-bitarchitecture with 4KB pages.
addresstranslatio
n
Virtual address translation maps a virtual page number (VPN) to a physical page frame number (PFN): the rest is easy.
PFN
offset
+
0
physical address {
Deliver exception toOS if translation is notvalid and accessible inrequested mode.
![Page 18: Managing Caching for I/O Jeff Chase Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062423/56649f265503460f94c3dd63/html5/thumbnails/18.jpg)
Virtual Addressing: Under the Hood
raiseexception
probepage table
loadTLB
probe TLB
accessphysicalmemory
accessvalid?
pagefault?
signalprocess
allocateframe
page ondisk?
fetchfrom disk
zero-fillloadTLB
starthere
MMU
OS
![Page 19: Managing Caching for I/O Jeff Chase Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062423/56649f265503460f94c3dd63/html5/thumbnails/19.jpg)
LRU Approximations for Paging
• Pure LRU and LFU are prohibitively expensive to implement.– most references are hidden by the TLB
– OS typically sees less than 10% of all references
– can’t tweak your ordered page list on every reference
• Most systems rely on an approximation to LRU for paging.– periodically sample the reference bit on each page
• visit page and set reference bit to zero
• run the process for a while (the reference window)
• come back and check the bit again
– reorder the list of eviction candidates based on sampling
![Page 20: Managing Caching for I/O Jeff Chase Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062423/56649f265503460f94c3dd63/html5/thumbnails/20.jpg)
VM page replacement
• Try to guess the working set of pages in active use for each VAS.
• To determine if a page is being used, arrange for MMU to notify OS on next use.
– E.g., reference bit, or disable read access to trigger a fault.
• Sample pages systematically to approximate LRU: e.g., CLOCK algorithm, or FIFO-with-Second-Chance (FIFO-2C)
![Page 21: Managing Caching for I/O Jeff Chase Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062423/56649f265503460f94c3dd63/html5/thumbnails/21.jpg)
Page fault rate by resident set size
![Page 22: Managing Caching for I/O Jeff Chase Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062423/56649f265503460f94c3dd63/html5/thumbnails/22.jpg)
Page fault rate over timeThreads in an address space may change their working sets.
![Page 23: Managing Caching for I/O Jeff Chase Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062423/56649f265503460f94c3dd63/html5/thumbnails/23.jpg)
FIFO-2C in Action (FreeBSD)
![Page 24: Managing Caching for I/O Jeff Chase Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062423/56649f265503460f94c3dd63/html5/thumbnails/24.jpg)
What Do the Pretty Colors Mean?
This is a plot of selected internal kernel events during a run of a process that randomly reads/writes its virtual memory.– x-axis: time in milliseconds (total run is about 3 seconds)
– y-axis: each event pertains to a physical page frame, whose PFN is given on the y-axis
The machine is an Alpha with 8000 8KB pages (64MB total)
The process uses 48MB of virtual memory: force the paging daemon to do FIFO-2C bookkeeping, but little actual paging.
– events: page allocate (yellow-green), page free (red), deactivation (duke blue), reactivation (lime green), page clean (carolina blue).
![Page 25: Managing Caching for I/O Jeff Chase Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062423/56649f265503460f94c3dd63/html5/thumbnails/25.jpg)
What to Look For– Some low physical memory ranges are reserved to the kernel.
– Process starts and soaks up memory that was initially free.
– Paging daemon evicts pages allocated to other processes, and the system reallocates the frames to the test process.
– After an initial flurry of demand-loading activity, things settle down after most of the process memory is resident.
– Paging daemon begins to scan more frequently as memory becomes overcommitted (dark blue deactivation stripes).
– Test process touches pages deactivated by the paging daemon, causing them to be reactivated.
– Test process exits (heavy red bar).
![Page 26: Managing Caching for I/O Jeff Chase Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062423/56649f265503460f94c3dd63/html5/thumbnails/26.jpg)
page alloc
![Page 27: Managing Caching for I/O Jeff Chase Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062423/56649f265503460f94c3dd63/html5/thumbnails/27.jpg)
deactivate
![Page 28: Managing Caching for I/O Jeff Chase Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062423/56649f265503460f94c3dd63/html5/thumbnails/28.jpg)
activate
![Page 29: Managing Caching for I/O Jeff Chase Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062423/56649f265503460f94c3dd63/html5/thumbnails/29.jpg)
clean
![Page 30: Managing Caching for I/O Jeff Chase Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062423/56649f265503460f94c3dd63/html5/thumbnails/30.jpg)
free
![Page 31: Managing Caching for I/O Jeff Chase Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062423/56649f265503460f94c3dd63/html5/thumbnails/31.jpg)
“Filers”
• Network-attached (IP)
• RAID appliance
• Multiple protocols– iSCSI, NFS, CIFS
• Admin interfaces
• Flexible configuration
• Lots of virtualization: dynamic volumes
• Volume cloning, mirroring, snapshots, etc.
• NetApp technology leader since 1994 (WAFL)
![Page 32: Managing Caching for I/O Jeff Chase Duke University.](https://reader035.fdocuments.net/reader035/viewer/2022062423/56649f265503460f94c3dd63/html5/thumbnails/32.jpg)
Network File System
[ucla.edu]