Adapted from UC Berkeley CS252 S01

23
1 apted from UC Berkeley CS252 S01 Lecture 19: Virtual Memory Virtual Memory concept, Virtual-physical translation, page table, TLB, Alpha 21264 memory hierarchy

description

Lecture 19: Virtual Memory. Virtual Memory concept, Virtual-physical translation, page table, TLB, Alpha 21264 memory hierarchy. Adapted from UC Berkeley CS252 S01. Virtual Memory. - PowerPoint PPT Presentation

Transcript of Adapted from UC Berkeley CS252 S01

Page 1: Adapted from  UC Berkeley CS252 S01

1Adapted from UC Berkeley CS252 S01

Lecture 19: Virtual Memory

Virtual Memory concept, Virtual-physical translation, page table, TLB, Alpha 21264 memory hierarchy

Page 2: Adapted from  UC Berkeley CS252 S01

Virtual MemoryVirtual memory (VM) allows programs to have the illusion of a very large memory that is not limited by physical memory size

Make main memory (DRAM) acts like a cache for secondary storage (magnetic disk)

Otherwise, application programmers have to move data in/out main memory

That’s how virtual memory was first proposed

Virtual memory also provides the following functions

Allowing multiple processes share the physical memory in multiprogramming environment

Providing protection for processes (compare Intel 8086: without VM applications can overwrite OS kernel)

Facilitating program relocation in physical memory space

Page 3: Adapted from  UC Berkeley CS252 S01

3

VM Example

Page 4: Adapted from  UC Berkeley CS252 S01

4

Virtual Memory and CacheVM address translation a provides a mapping from the virtual address of the processor to the physical address in main memory and secondary storage.

Cache terms vs. VM terms Cache block => page Cache Miss => page fault

Tasks of hardware and OS TLB does fast address translations OS handles less frequently events:

page fault TLB miss (when software approach is used)

Page 5: Adapted from  UC Berkeley CS252 S01

Virtual Memory and Cache

Parameter L1 Cache Main Memory

Block (page) size 16-128 bytes 4KB – 64KB

Hit time 1-3 cycles 50-150 cycles

Miss Penalty 8-300 cycles 1M to 10M cycles

Miss rate 0.1-10% 0.00001-0.001%

Address mapping 25-45 bits => 13-21 bits

32-64 bits => 25-45 bits

Page 6: Adapted from  UC Berkeley CS252 S01

4 Qs for Virtual Memory

Q1: Where can a block be placed in the upper level?

Miss penalty for virtual memory is very high => Full associativity is desirable (so allow blocks to be placed anywhere in the memory)

Have software determine the location while accessing disk (10M cycles enough to do sophisticated replacement)

Q2: How is a block found if it is in the upper level?

Address divided into page number and page offset Page table and translation buffer used for address

translation Q: why fully associativity does not affect hit time?

Page 7: Adapted from  UC Berkeley CS252 S01

7

4 Qs for Virtual MemoryQ3: Which block should be replaced on a miss? Want to reduce miss rate & can handle in

software Least Recently Used typically used A typical approximation of LRU

Hardware set reference bits OS record reference bits and clear them

periodically OS selects a page among least-recently referenced

for replacement

Q4: What happens on a write? Writing to disk is very expensive Use a write-back strategy

Page 8: Adapted from  UC Berkeley CS252 S01

8

Virtual and Physical AddressesA virtual address consists of a virtual page number and a page offset. The virtual page number gets translated to a physical page number.The page offset is not changed

Virtual Page Number Page offset

Physical Page Number Page offset

Translation

Virtual Address

Physical Address

36 bits

33 bits

12 bits

12 bits

Page 9: Adapted from  UC Berkeley CS252 S01

9

Address Translation Via Page Table

Assume the access hits in main memory

Page 10: Adapted from  UC Berkeley CS252 S01

Address Translation with Page TablesA page table translates a virtual page number into a physical page numberA page table register indicates the start of the page table.The virtual page number is used as an index into the page table that contains

The physical page number A valid bit that indicates if the page is present in main

memory A dirty bit to indicate if the page has been written Protection information about the page (read only,

read/write, etc.)Since page tables contain a mapping for every virtual page, no tags are required (how to compare it with cache?)

Page table access is slow; we will see the solution

Page 11: Adapted from  UC Berkeley CS252 S01

11

Page Table Diagram

Page 12: Adapted from  UC Berkeley CS252 S01

12

Accessing Main Memory or Disk

Valit bit being zero means the page is not in main memoryThen a page fault occurs, and the missing page is read in from disk.

Page 13: Adapted from  UC Berkeley CS252 S01

13

How Large Is Page Table?Suppose

48-bit virtual address 41-bit physical address 8 KB pages => 13 bit page offset Each page table entry is 8 bytes

How large is the page table? Virtual page number = 48 - 13 = 25 bytes Number of entries = number of pages = 225 =

32M Total size = number of entries x bytes/entry = 32M x 8B = 256 Mbytes Each process needs its own page table

Page tables have to be very large, thus must be stored in main page or even paged, resulting in slow accessWe need techniques to reduce page table size

Page 14: Adapted from  UC Berkeley CS252 S01

14

TLB: Improving Page Table AccessCannot afford accessing page table for every access include cache hits (then cache itself makes no sense)Again, use cache to speed up accesses to page table! (cache for cache?)TLB is translation lookaside buffer storing frequently accessed page table entryA TLB entry is like a cache entry Tag holds portions of virtual address Data portion holds physical page number,

protection field, valid bit, use bit, and dirty bit (like in page table entry)

Usually fully associative or highly set associative

Usually 64 or 128 entriesAccess page table only for TLB misses

Page 15: Adapted from  UC Berkeley CS252 S01

15

TLB CharacteristicsThe following are characteristics of TLBs

TLB size : 32 to 4,096 entries Block size : 1 or 2 page table entries (4 or

8 bytes each) Hit time: 0.5 to 1 clock cycle Miss penalty: 10 to 30 clock cycles (go to

page table) Miss rate: 0.01% to 0.1% Associative : Fully associative or set

associative Write policy : Write back (replace

infrequently)

Page 16: Adapted from  UC Berkeley CS252 S01

16

Alpha 21264 Data TLB128 entries, fully associativeASN (like PID) to avoid flushingAlso check protection

Page 17: Adapted from  UC Berkeley CS252 S01

17

Determine Page SizeLarger Size Comments

Page table size Inversely proportionalFast L1 cache hit L1 cache can be largerI/O utilization Longer burst transferTLB hit rate Increasing TLB coverageStorage efficiency Reducing fragmentationI/O efficiency Unnecessary data

transferProcess start-up Small processes are

popular

Most commonly used size: 4KB or 8KB Hardware may support a range of page sizes OS selects the best one(s) for its purpose

Page 18: Adapted from  UC Berkeley CS252 S01

18

Alpha 21264 TLB Access

Virtual indexedPhysically tagged

Physically indexedPhysically tagged

Page 19: Adapted from  UC Berkeley CS252 S01

19

Alpha 21264 Virtual MemoryCombining segmentation and paging Segmentation: variable-size memory space range,

usually defined by a base register and a limit field Segmentation assign meanings to address spaces,

and reduce address space that needs paging (reducing page table size)

Paging is used on the address space of each segment

Three segments in Alpha kseg: reserved for OS kernel, not VM management seg0: virtual address accessible to user process seg1: virtual address accessible to OS kernel

Page 20: Adapted from  UC Berkeley CS252 S01

20

Two Viewpoints of Virtual Memory

Application programs Sees a large, flat memory space Assumes fast access to every place Hardware/OS hide the complexity

OS Kernel Manages multiple process spaces Reserves direct accesses to some portions of

physical memory May access physical memory, its own virtual

memory, and virtual memory of the current process

Hardware facilitates fast VM accesses, and OS manages slow, less frequent events

Page 21: Adapted from  UC Berkeley CS252 S01

21

Alpha 21264 Page Table10-bit

1024 8B PTEs

13-bit

13-bit28-bit

Page table access on TLB miss managed bysoftware

Page 22: Adapted from  UC Berkeley CS252 S01

22

Memory Protection Memory protection: preventing unauthorized accesses to process and kernel memoryMemory protection implementation: User programs can only access through

virtual memory PTE entry contains protection bits to allow

shared but protected accesses

Protection fields in Alpha Valid, user read enable, kernel read enable,

user write enable, and kernel write enable

Page 23: Adapted from  UC Berkeley CS252 S01

23

Memory Hierarchy Example:Alpha 21264 in AlphaServer ES40

L1 instruction cache: 2-way, 64KB, 64-byte block, Virtually indexed and tagged

Use way prediction and line prediction to allow instruction fetching

Inst prefetcher: store four prefetched instructions, accessed before L2 cacheL1 data cache: 2-way, 64KB, 64-byte block, Virtually indexed, physically tagged, write-throughVictim buffer: 8-entry, checked before L2 accessL2 unified cache: 1-way 1MB to 16MB, off-chip, write-back;

Allow critical-word transfer to L1 cache, transfers 16B per 2.25ns

TLB: 128-entry fully associative for inst and data (each)ES40: L1 miss penalty 22ns, L2 130 ns; up to 32GB memory; 256-bit memory buses (64-bit into processor)Read 5.13 for more details