Practical, Transparent Operating System Support for Superpages
description
Transcript of Practical, Transparent Operating System Support for Superpages
![Page 1: Practical, Transparent Operating System Support for Superpages](https://reader036.fdocuments.net/reader036/viewer/2022062315/568159a5550346895dc70315/html5/thumbnails/1.jpg)
Practical, Transparent Operating System Support for Superpages
J. NavarroRice University and Universidad Católica de Chile
S. Iyer, P. Druschel, A. CoxRice University
![Page 2: Practical, Transparent Operating System Support for Superpages](https://reader036.fdocuments.net/reader036/viewer/2022062315/568159a5550346895dc70315/html5/thumbnails/2.jpg)
Paper Highlights
• Presents an general efficient mechanism to manage pages of different sizes in a VM system– Superpages
• Objective is to address the limitations of extant translation lookaside buffers (TLB).
![Page 3: Practical, Transparent Operating System Support for Superpages](https://reader036.fdocuments.net/reader036/viewer/2022062315/568159a5550346895dc70315/html5/thumbnails/3.jpg)
The translation look aside buffer (I)
• Small high-speed memory– Contains a fixed number of page table entries– Content-addressable memory
• Entries include page frame number and page number
Page frame number BitsPage number
![Page 4: Practical, Transparent Operating System Support for Superpages](https://reader036.fdocuments.net/reader036/viewer/2022062315/568159a5550346895dc70315/html5/thumbnails/4.jpg)
The translation look aside buffer (II)
• Usually fully associative– Not always true (see Intel Nehalem)
• Considerably fewer entries than an L1 cache– Speed considerations
![Page 5: Practical, Transparent Operating System Support for Superpages](https://reader036.fdocuments.net/reader036/viewer/2022062315/568159a5550346895dc70315/html5/thumbnails/5.jpg)
Realizations (I)
• TLB of ULTRA SPARC III– 64-bit addresses
• Maximum program size is 244 bytes, that is,16 TB
– Supported page sizes are 4 KB, 16KB, 64 KB, 4MB ("superpages")
– External L2 cache had a maximum capacity of 8 MB.
Do not even attempt to memorize this!
![Page 6: Practical, Transparent Operating System Support for Superpages](https://reader036.fdocuments.net/reader036/viewer/2022062315/568159a5550346895dc70315/html5/thumbnails/6.jpg)
Realizations (II)
• TLB of ULTRA SPARC III– Dual direct-mapping TLB
• 64 entries for code pages• 64 entries for data pages
– Each entry occupies 64 bits• Page number and page frame number• Context• Valid bit, dirty bit, …
Do not even attempt to memorize this!
![Page 7: Practical, Transparent Operating System Support for Superpages](https://reader036.fdocuments.net/reader036/viewer/2022062315/568159a5550346895dc70315/html5/thumbnails/7.jpg)
Realizations (III)
• Intel Nehalem Architecture:– Two-level TLB:
• First level: –Two parts
• Data TLB has 64 entries for 4K pages (4K) or 32 for big pages (2M/4M)
• Instruction TLB has 128 entries for 4K pages and 7 for big pages.
Do not even attempt to memorize this!
![Page 8: Practical, Transparent Operating System Support for Superpages](https://reader036.fdocuments.net/reader036/viewer/2022062315/568159a5550346895dc70315/html5/thumbnails/8.jpg)
Realizations (IV)
• Second level:–Unified cache–Can store up to 512 entries–Operates only with 4K pages
Do not even attempt to memorize this!
![Page 9: Practical, Transparent Operating System Support for Superpages](https://reader036.fdocuments.net/reader036/viewer/2022062315/568159a5550346895dc70315/html5/thumbnails/9.jpg)
The main problem• TLB sizes have not grown with sizes of main
memories• Define TLB coverage as amount of main memory
that can be accessed without incurring TLB misses– Typically one gigabyte or less
• Relative TLB coverage is fraction of main memory that can be accessed without incurring TLB misses
![Page 10: Practical, Transparent Operating System Support for Superpages](https://reader036.fdocuments.net/reader036/viewer/2022062315/568159a5550346895dc70315/html5/thumbnails/10.jpg)
Back to our examples
• Ultra SPARC III– with 4 KB pages:
• (64 + 64)×4 KB = 512 KB– with 16 KB pages:
• (64 + 64)×16 KB = 2 MB
Do not even attempt to memorize this!
![Page 11: Practical, Transparent Operating System Support for Superpages](https://reader036.fdocuments.net/reader036/viewer/2022062315/568159a5550346895dc70315/html5/thumbnails/11.jpg)
Back to our examples
• Intel Nehalem– with 4 KB pages:
• Level 1:– (64 + 128)×4 KB = 768 KB
• Level 2:–512×4 KB = 2 MB
![Page 12: Practical, Transparent Operating System Support for Superpages](https://reader036.fdocuments.net/reader036/viewer/2022062315/568159a5550346895dc70315/html5/thumbnails/12.jpg)
Evolution of relative TLB coverage
![Page 13: Practical, Transparent Operating System Support for Superpages](https://reader036.fdocuments.net/reader036/viewer/2022062315/568159a5550346895dc70315/html5/thumbnails/13.jpg)
Consequences
• Processes with very large working sets incur too many TLB misses– "Significant performance penalty"
• Some machines have L2 caches bigger than their TLB coverage– Can have TLB misses for data already in L2
cache
![Page 14: Practical, Transparent Operating System Support for Superpages](https://reader036.fdocuments.net/reader036/viewer/2022062315/568159a5550346895dc70315/html5/thumbnails/14.jpg)
Solutions (I)
• Increase TLB size:– Would increase TLB access time– Would slow down memory accesses
• Increase page sizes:– Would cause increased memory
fragmentation and poor utilization of main memory
![Page 15: Practical, Transparent Operating System Support for Superpages](https://reader036.fdocuments.net/reader036/viewer/2022062315/568159a5550346895dc70315/html5/thumbnails/15.jpg)
Solutions (II)
• Use multiple page sizes:– Keep a relatively small "base" page size
• Say 4 KB– Let them coexist with much larger page sizes
• Superpages– Intel Nehalem solution
![Page 16: Practical, Transparent Operating System Support for Superpages](https://reader036.fdocuments.net/reader036/viewer/2022062315/568159a5550346895dc70315/html5/thumbnails/16.jpg)
Hardware limitations (I)
• Superpage sizes must be supported by hardware:– 4 KB, 16KB, 64 KB, 4MB for UltraSPARC III– 4 KB, 2 MB and 4 MB for Intel Nehanem– Ten possible page sizes from 4KB to 256M
for Intel Itanium
![Page 17: Practical, Transparent Operating System Support for Superpages](https://reader036.fdocuments.net/reader036/viewer/2022062315/568159a5550346895dc70315/html5/thumbnails/17.jpg)
Hardware limitations (II)
• Superpages must be contiguous andproperly aligned in both virtual and physical address spaces
• Single TLB entry for each superpage – All its base pages must have
• Same protection attributes• Same clean/dirty status
–Will cause problems
![Page 18: Practical, Transparent Operating System Support for Superpages](https://reader036.fdocuments.net/reader036/viewer/2022062315/568159a5550346895dc70315/html5/thumbnails/18.jpg)
Issues and trade-offs
![Page 19: Practical, Transparent Operating System Support for Superpages](https://reader036.fdocuments.net/reader036/viewer/2022062315/568159a5550346895dc70315/html5/thumbnails/19.jpg)
Allocation
• When we bring a page in main memory, we can – Put it anywhere in RAM
• Will need to relocate it to a suitable place when we merge it into a superpage
– Put it in a location that would let us "grow" a superpage around it:reservation-based allocation• Must pick a maximum size for the superpage
![Page 20: Practical, Transparent Operating System Support for Superpages](https://reader036.fdocuments.net/reader036/viewer/2022062315/568159a5550346895dc70315/html5/thumbnails/20.jpg)
Fragmentation control
• The OS must keep contiguous chunks of memory availably at any time– OS will break previous reservation
commitments if the superpage is unlikely to materialize
– Must "treat contiguity a a potentially contended resource"
![Page 21: Practical, Transparent Operating System Support for Superpages](https://reader036.fdocuments.net/reader036/viewer/2022062315/568159a5550346895dc70315/html5/thumbnails/21.jpg)
Promotion
• Once a sufficient number of base pages within a potential superpage have been allocated, the OS may elect to promote them into a superpage.This requires– Updating PTEs for all bases pages in the new
superpage– Bringing the missing base pages into main
memory
![Page 22: Practical, Transparent Operating System Support for Superpages](https://reader036.fdocuments.net/reader036/viewer/2022062315/568159a5550346895dc70315/html5/thumbnails/22.jpg)
Promotion
• Promotion can be incremental– Progressively larger and larger superpages
In use In use In use FreeIn use In use In use Free
Superpage In use Free
![Page 23: Practical, Transparent Operating System Support for Superpages](https://reader036.fdocuments.net/reader036/viewer/2022062315/568159a5550346895dc70315/html5/thumbnails/23.jpg)
Demotion
• OS should disband or reduce the size of a superpage whenever some portions of it fall in disuse
• Main problem is that OS can only track accesses at the level of the superpage
![Page 24: Practical, Transparent Operating System Support for Superpages](https://reader036.fdocuments.net/reader036/viewer/2022062315/568159a5550346895dc70315/html5/thumbnails/24.jpg)
Eviction
• Not that different from expelling individual base pages– Must flush out all base pages of any
superpage containing dirty pages• OS cannot ascertain which base pages
remain clean
![Page 25: Practical, Transparent Operating System Support for Superpages](https://reader036.fdocuments.net/reader036/viewer/2022062315/568159a5550346895dc70315/html5/thumbnails/25.jpg)
Related approaches
•Many OS kernels use superpages•Focus here is on application memory
![Page 26: Practical, Transparent Operating System Support for Superpages](https://reader036.fdocuments.net/reader036/viewer/2022062315/568159a5550346895dc70315/html5/thumbnails/26.jpg)
Reservations
• Talluri and Hill:– propose a reservation-based scheme– reservations can be preempted– emphasis is on partial subblocks
• HP-UX and IRIX– Create superpages at page fault time– User must specify a preferred per segment page
size
![Page 27: Practical, Transparent Operating System Support for Superpages](https://reader036.fdocuments.net/reader036/viewer/2022062315/568159a5550346895dc70315/html5/thumbnails/27.jpg)
Page relocation
• Relocation-based schemes – Let base pages reside any place in main
memory– Migrate these pages to a contiguous region in
main memory when they find out that superpages are "likely to be beneficial."
• Disadvantage: cost of copying base pages• Advantage: " more robust to fragmentation"
![Page 28: Practical, Transparent Operating System Support for Superpages](https://reader036.fdocuments.net/reader036/viewer/2022062315/568159a5550346895dc70315/html5/thumbnails/28.jpg)
Hardware support
• Two proposals – Having multiple valid bits in each TLB entry
• Would allow small superpages to contain missing base pages
• Partial subblocking (Talluri and Hill)– Adding additional level of address translation in
memory controller • Would "eliminate the contiguity requirement for
superpages" (Fang et al.)
![Page 29: Practical, Transparent Operating System Support for Superpages](https://reader036.fdocuments.net/reader036/viewer/2022062315/568159a5550346895dc70315/html5/thumbnails/29.jpg)
Design
![Page 30: Practical, Transparent Operating System Support for Superpages](https://reader036.fdocuments.net/reader036/viewer/2022062315/568159a5550346895dc70315/html5/thumbnails/30.jpg)
Allocation
• Use– a reservation-based scheme for superpages
• assumes a preferred superpage size for a given range of addresses
– a buddy system to manage main memory• Think of scheme used to manage block
fragments in Unix FFS
![Page 31: Practical, Transparent Operating System Support for Superpages](https://reader036.fdocuments.net/reader036/viewer/2022062315/568159a5550346895dc70315/html5/thumbnails/31.jpg)
Preferred superpage size (I)
• For fixed-size memory objects, picklargest aligned superpage that– Contains the faulting base page– Does not overlap with other superpages or
tentative superpages– Does not extend over the boundaries of the
object
![Page 32: Practical, Transparent Operating System Support for Superpages](https://reader036.fdocuments.net/reader036/viewer/2022062315/568159a5550346895dc70315/html5/thumbnails/32.jpg)
Preferred superpage size (II)
• For dynamically-size memory objects, picklargest aligned superpage that– Contains the faulting base page– Does not overlap with other superpages or
tentative superpages– Does not exceed the current size of the
object
![Page 33: Practical, Transparent Operating System Support for Superpages](https://reader036.fdocuments.net/reader036/viewer/2022062315/568159a5550346895dc70315/html5/thumbnails/33.jpg)
Fragmentation control
• Mostly managed by buddy allocator– Helped by page replacement daemon
• Modified BSD daemon is made"contiguity-aware"
![Page 34: Practical, Transparent Operating System Support for Superpages](https://reader036.fdocuments.net/reader036/viewer/2022062315/568159a5550346895dc70315/html5/thumbnails/34.jpg)
Promotion
• Use incremental promotion• Wait until superpage is fully populated
• Conservative approach
![Page 35: Practical, Transparent Operating System Support for Superpages](https://reader036.fdocuments.net/reader036/viewer/2022062315/568159a5550346895dc70315/html5/thumbnails/35.jpg)
Demotion (I)
• Incremental demotion– Required when
• A base page of a superpage is expelled from main memory
• Protection attributes of some base pages are changed
![Page 36: Practical, Transparent Operating System Support for Superpages](https://reader036.fdocuments.net/reader036/viewer/2022062315/568159a5550346895dc70315/html5/thumbnails/36.jpg)
Demotion (II)
• Speculative demotion– Could be done each time a superpage
referenced bit is reset• When memory becomes scarce
– Let system know which parts of a superpage are still in use
![Page 37: Practical, Transparent Operating System Support for Superpages](https://reader036.fdocuments.net/reader036/viewer/2022062315/568159a5550346895dc70315/html5/thumbnails/37.jpg)
Handling dirty superpages (I)
• Demote superpages as soon as they a base page modified – Otherwise would have to flush out whole
superpage when it will be expelled from main memory• Because there is one single dirty bit per
superpage
![Page 38: Practical, Transparent Operating System Support for Superpages](https://reader036.fdocuments.net/reader036/viewer/2022062315/568159a5550346895dc70315/html5/thumbnails/38.jpg)
Handling dirty superpages (II)
• A superpage has been modified
– The whole superpage is dirty
• We break up the superpage
– All other pages remain clean
X
X
![Page 39: Practical, Transparent Operating System Support for Superpages](https://reader036.fdocuments.net/reader036/viewer/2022062315/568159a5550346895dc70315/html5/thumbnails/39.jpg)
Multi-list reservation scheme
• Maintains separate lists for each superpage size supported by the hardware, but largest one
• Each list contains reserved frames that could still accommodate a superpage of that size– Sorted by time of their most recent page
frame allocation– Oldest entries are preempted first
![Page 40: Practical, Transparent Operating System Support for Superpages](https://reader036.fdocuments.net/reader036/viewer/2022062315/568159a5550346895dc70315/html5/thumbnails/40.jpg)
Example
• Area above contains 8 page frames reserved for a possible superpage– Three frames are allocated, five are free– Breaking the reservation will free space for
• A superpage with 4 base pages or• Two superpages with two base page each
![Page 41: Practical, Transparent Operating System Support for Superpages](https://reader036.fdocuments.net/reader036/viewer/2022062315/568159a5550346895dc70315/html5/thumbnails/41.jpg)
Population maps
• One per memory object• Keep track of allocated pages within each object
![Page 42: Practical, Transparent Operating System Support for Superpages](https://reader036.fdocuments.net/reader036/viewer/2022062315/568159a5550346895dc70315/html5/thumbnails/42.jpg)
EVALUATION
![Page 43: Practical, Transparent Operating System Support for Superpages](https://reader036.fdocuments.net/reader036/viewer/2022062315/568159a5550346895dc70315/html5/thumbnails/43.jpg)
Benchmarks
• Thirty-five representative programs running on an Alpha processor– Four page sizes: 8 KB, 64 KB, 512 KB and 4 MB– Fully associative TLB with 128 entries for code
and 128 for data– 512 MB of RAM– Separate 64 KB code and 64 KB data L1 caches– 4 MB unified L2 cache
![Page 44: Practical, Transparent Operating System Support for Superpages](https://reader036.fdocuments.net/reader036/viewer/2022062315/568159a5550346895dc70315/html5/thumbnails/44.jpg)
Results (I)
• Eighteen out of 35 benchmarks showed improvements over 5 percent
• Ten out of 35 showed improvements over 25 percent
• A single application showed a degradation of 1.5 percent– Allocator does not does not distinguish zeroed-
out pages from other free pages
![Page 45: Practical, Transparent Operating System Support for Superpages](https://reader036.fdocuments.net/reader036/viewer/2022062315/568159a5550346895dc70315/html5/thumbnails/45.jpg)
Results (II)
• Different applications benefit most from different superpage sizes– Should let system choose among multiple page
sizes• Contiguity-aware page replacement daemon can
maintain enough contiguous regions• Huge penalty for not demoting dirty superpages• Overheads are small
![Page 46: Practical, Transparent Operating System Support for Superpages](https://reader036.fdocuments.net/reader036/viewer/2022062315/568159a5550346895dc70315/html5/thumbnails/46.jpg)
CONCLUSION
• It works and does not require any changes to existing hardware