Computer Architecture 2008 – VM 1 Computer Architecture Virtual Memory Dr. Lihu Rappoport.

38
Computer 1 Computer Architecture Virtual Memory Dr. Lihu Rappoport

Transcript of Computer Architecture 2008 – VM 1 Computer Architecture Virtual Memory Dr. Lihu Rappoport.

Page 1: Computer Architecture 2008 – VM 1 Computer Architecture Virtual Memory Dr. Lihu Rappoport.

Computer Architecture 2008 – VM1

Computer Architecture

Virtual Memory

Dr. Lihu Rappoport

Page 2: Computer Architecture 2008 – VM 1 Computer Architecture Virtual Memory Dr. Lihu Rappoport.

Computer Architecture 2008 – VM2

Virtual Memory

Provides the illusion of a large memory

Different machines have different amount of physical memory– Allows programs to run regardless of actual physical memory size

The amount of memory consumed by each process is dynamic– Allow adding memory as needed

Many processes can run on a single machine– Provide each process its own memory space

– Prevents a process from accessing the memory of other processes running on the same machine

– Allows the sum of memory spaces of all process to be larger than physical memory

Basic terminology– Virtual Address Space: address space used by the programmer

– Physical Address: actual physical memory address space

Page 3: Computer Architecture 2008 – VM 1 Computer Architecture Virtual Memory Dr. Lihu Rappoport.

Computer Architecture 2008 – VM3

Virtual Memory: Basic Idea

Divide memory (virtual and physical) into fixed size blocks

– Pages in Virtual space, Frames in Physical space

– Page size = Frame size

– Page size is a power of 2: page size = 2k

All pages in the virtual address space are contiguous

Pages can be mapped into physical Frames in any order

Some of the pages are in main memory (DRAM), some of the pages are on disk

All programs are written using Virtual Memory Address Space

The hardware does on-the-fly translation between virtual and physical address spaces

– Use a Page Table to translate between Virtual and Physical addresses

Page 4: Computer Architecture 2008 – VM 1 Computer Architecture Virtual Memory Dr. Lihu Rappoport.

Computer Architecture 2008 – VM4

Main memory can act as a cache for the secondary storage (disk)

Advantages:– illusion of having more physical memory– program relocation – protection

Virtual Memory

Virtual Addresses Physical Addresses

Address Translation

Disk Addresses

Page 5: Computer Architecture 2008 – VM 1 Computer Architecture Virtual Memory Dr. Lihu Rappoport.

Computer Architecture 2008 – VM5

Virtual to Physical Address translation

31

Page offset

011

Virtual Page Number

Page offset

11 0

Physical Frame Number

29

Virtual Address

Physical Address

V D Frame number

1

Page table basereg

0

Valid bit

Dirty bit

12

AC

Access Control

12

Page size: 212 byte =4K byte

Page 6: Computer Architecture 2008 – VM 1 Computer Architecture Virtual Memory Dr. Lihu Rappoport.

Computer Architecture 2008 – VM6

Page Tables

Valid

1

Physical Memory

Disk

Page TablePhysical Page

Or Disk Address

1

1

1

1

1

11

1

0

0

0

Virtual page number

Page 7: Computer Architecture 2008 – VM 1 Computer Architecture Virtual Memory Dr. Lihu Rappoport.

Computer Architecture 2008 – VM7

If V = 1 then

page is in main memory at frame address stored in table

Fetch data

else (page fault)

need to fetch page from disk

causes a trap, usually accompanied by a context switch:

current process suspended while page is fetched from disk

Access Control (R = Read-only, R/W = read/write, X = execute only)

If kind of access not compatible with specified access rights then protection_violation_fault

causes trap to hardware, or software fault handler

Missing item fetched from secondary memory only on the occurrence of a fault demand load policy

Address Mapping Algorithm

Page 8: Computer Architecture 2008 – VM 1 Computer Architecture Virtual Memory Dr. Lihu Rappoport.

Computer Architecture 2008 – VM8

Page Replacement Algorithm

Not Recently Used (NRU)– Associated with each page is a reference flag such that

ref flag = 1 if the page has been referenced in recent past

If replacement is needed, choose any page frame such that its reference bit is 0. – This is a page that has not been referenced in the recent past

Clock implementation of NRU:

While (PT[LRP].NRU) { PT[LRP].NRU LRP++ (mod table size) }

1 01 000

page table entry

Ref bit

1 0

Possible optimization: search for a page that is both not recently

referenced AND not dirty

Page 9: Computer Architecture 2008 – VM 1 Computer Architecture Virtual Memory Dr. Lihu Rappoport.

Computer Architecture 2008 – VM9

Page Faults

Page faults: the data is not in memory retrieve it from disk– The CPU must detect situation

– The CPU cannot remedy the situation (has no knowledge of the disk)

– CPU must trap to the operating system so that it can remedy the situation

– Pick a page to discard (possibly writing it to disk)

– Load the page in from disk

– Update the page table

– Resume to program so HW will retry and succeed!

Page fault incurs a huge miss penalty – Pages should be fairly large (e.g., 4KB)

– Can handle the faults in software instead of hardware

– Page fault causes a context switch

– Using write-through is too expensive so we use write-back

Page 10: Computer Architecture 2008 – VM 1 Computer Architecture Virtual Memory Dr. Lihu Rappoport.

Computer Architecture 2008 – VM10

Optimal Page Size

Minimize wasted storage– Small page minimizes internal fragmentation

– Small page increase size of page table

Minimize transfer time– Large pages (multiple disk sectors) amortize access cost

– Sometimes transfer unnecessary info

– Sometimes prefetch useful data

– Sometimes discards useless data early

General trend toward larger pages because– Big cheap RAM

– Increasing memory / disk performance gap

– Larger address spaces

Page 11: Computer Architecture 2008 – VM 1 Computer Architecture Virtual Memory Dr. Lihu Rappoport.

Computer Architecture 2008 – VM11

Translation Lookaside Buffer (TLB)

Page table resides in memory each translation requires a memory access

TLB

– Cache recently used PTEs

– speed up translation

– typically 128 to 256 entries

– usually 4 to 8 way associative

– TLB access time is comparable to L1 cache access time

Yes

NoTLB Hit ?

AccessPage Table

Virtual Address

Physical Addresses

TLB Access

Page 12: Computer Architecture 2008 – VM 1 Computer Architecture Virtual Memory Dr. Lihu Rappoport.

Computer Architecture 2008 – VM12

TLB is a cache for recent address translations:

Making Address Translation Fast

Valid

1

1

1

1

0

1

1

0

1

1

0

1

1

1

1

1

0

1

Physical Memory

Disk

Virtual page number

Page Table

Valid Tag Physical PageTLB

Physical PageOr

Disk Address

Page 13: Computer Architecture 2008 – VM 1 Computer Architecture Virtual Memory Dr. Lihu Rappoport.

Computer Architecture 2008 – VM13

TLB Access

Tag Set

Offset

Set#

Hit/Miss

Way MUX

PTE

1

1

1

1

1

1

11

1

11

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

====

Way 0 Way 1 Way 2 Way 3 Way 0 Way 1 Way 2 Way 3

Virtual page number

Page 14: Computer Architecture 2008 – VM 1 Computer Architecture Virtual Memory Dr. Lihu Rappoport.

Computer Architecture 2008 – VM14

Virtual Memory And Cache

Yes

No

TLB Access

TLB Hit ?Access

Page Table

Access Cache

Virtual Address

Cache Hit ?

Yes

No AccessMemory

Physical Addresses

Data

TLB access is serial with cache access

Page 15: Computer Architecture 2008 – VM 1 Computer Architecture Virtual Memory Dr. Lihu Rappoport.

Computer Architecture 2008 – VM15

More On Page Swap-out DMA copies the page to the disk controller

– Reads each byte: Executes snoop-invalidate for each byte in the cache (both L1 and L2) If the byte resides in the cache:

if it is modified reads its line from the cache into memory invalidates the line

– Writes the byte to the disk controller

– This means that when a page is swapped-out of memory All data in the caches which belongs to that page is invalidated The page in the disk is up-to-date

The TLB is snooped– If the TLB hits for the swapped-out page, TLB entry is invalidated

In the page table – The valid bit in the PTE entry of the swapped-out pages set to 0

– All the rest of the bits in the PTE entry may be used by the operating system for keeping the location of the page in the disk

Page 16: Computer Architecture 2008 – VM 1 Computer Architecture Virtual Memory Dr. Lihu Rappoport.

Computer Architecture 2008 – VM16

Overlapped TLB & Cache Access

#Set is not contained within the Page Offset

The #Set is not known until the physical page number is known

Cache can be accessed only after address translation done

Virtual Memory view of a Physical Address

Cache view of a Physical Address

0

Page offset

11

Page Number

1229

0

disp

13

tag

1429 5

set

6

Page 17: Computer Architecture 2008 – VM 1 Computer Architecture Virtual Memory Dr. Lihu Rappoport.

Computer Architecture 2008 – VM17

Overlapped TLB & Cache Access (cont)

In the above example #Set is contained within the Page Offset

The #Set is known immediately

Cache can be accessed in parallel with address translation

Once translation is done, match upper bits with tags

Limitation: Cache ≤ (page size × associativity)

Virtual Memory view of a Physical Address

Cache view of a Physical Address

0

Page offset

11

Page Number

1229

029 5

disptag set

61112

Page 18: Computer Architecture 2008 – VM 1 Computer Architecture Virtual Memory Dr. Lihu Rappoport.

Computer Architecture 2008 – VM18

Overlapped TLB & Cache Access (cont)

Assume 4K byte per page bits [11:0] are not translated

Assume cache is 32K Byte, 2 way set-associative, 64 byte/line– (215/ 2 ways) / (26 bytes/line) = 215-1-6 = 28 = 256 sets

Physical_addr[13:12] may be different than virtual_addr[13:12] – Tag is comprised of bits [31:12] of the physical address

The tag may mis-match bits [13:12] of the physical address– Cache miss allocate missing line according to its virtual set address

and physical tag

0

Page offset

11

Page Number

1229

0

disp

13

tag

1420 5

set

6

Page 19: Computer Architecture 2008 – VM 1 Computer Architecture Virtual Memory Dr. Lihu Rappoport.

Computer Architecture 2008 – VM19

Context Switch

Each process has its own address space– Each process has its own page table

– When the OS allocates to each process frames in physical memory, and updates the page table of each process

– A process cannot access physical memory allocated to another process Unless the OS deliberately allocates the same physical frame to 2

processes (for memory sharing)

On a context switch– Save the current architectural state to memory

Architectural registers Register that holds the page table base address in memory

– Flush the TLB

– Load the new architectural state from memory Architectural registers Register that holds the page table base address in memory

Page 20: Computer Architecture 2008 – VM 1 Computer Architecture Virtual Memory Dr. Lihu Rappoport.

Computer Architecture 2008 – VM20

VM in VAX: Address Format

Page size: 29 byte = 512 bytes

31

Page offset

08

Virtual Page Number

Virtual Address930 29

0 0 - P0 process space (code and data)0 1 - P1 process space (stack)1 0 - S0 system space1 1 - S1

Page offset

8 0

Physical Frame Number

29

Physical Address

9

Page 21: Computer Architecture 2008 – VM 1 Computer Architecture Virtual Memory Dr. Lihu Rappoport.

Computer Architecture 2008 – VM21

VM in VAX: Virtual Address Spaces

Process0 Process1 Process2 Process3

P0 process code & global vars grow upward

P1 process stack & local vars grow downward

S0 system space grows upward, generally static

0

7FFFFFFF80000000

Page 22: Computer Architecture 2008 – VM 1 Computer Architecture Virtual Memory Dr. Lihu Rappoport.

Computer Architecture 2008 – VM22

Page Table Entry (PTE)

V PROT M Z OWN S S

0

Physical Frame Number

31 20

Valid bit =1 if page mapped to main memory, otherwise page fault: • Page on the disk swap area • Address indicates the page location on the disk

4 Protection bits

Modified bit

3 ownership bits

Indicate if the line was cleaned (zero)

Page 23: Computer Architecture 2008 – VM 1 Computer Architecture Virtual Memory Dr. Lihu Rappoport.

Computer Architecture 2008 – VM23

System Space Address Translation

00

10 Page offset8 029 9

VPN

SBR (System page table base physical address)

+

VPN0

PTE physical address=

00

00

PFN (from PTE)

Page offset8 029 9

PFN

Get PTE

Page 24: Computer Architecture 2008 – VM 1 Computer Architecture Virtual Memory Dr. Lihu Rappoport.

Computer Architecture 2008 – VM24

System Space Address Translation

SBR

VPN*4

PFN

10 offset8 029 9

VPN

offset8 029 9

PFN

31

Page 25: Computer Architecture 2008 – VM 1 Computer Architecture Virtual Memory Dr. Lihu Rappoport.

Computer Architecture 2008 – VM25

P0 Space Address Translation

00

00 Page offset8 029 9

VPN

P0BR (P0 page table base virtual address)

+

VPN0

PTE S0 space virtual address=

00

00

PFN (from PTE)

Page offset8 029 9

PFN

Get PTE using system space translation algorithm

Page 26: Computer Architecture 2008 – VM 1 Computer Architecture Virtual Memory Dr. Lihu Rappoport.

Computer Architecture 2008 – VM26

P0 Space Address Translation (cont)

SBR

P0BR+VPN*4

Offset’8 029 9

PFN’

00 offset8 029 9

VPN31

10 Offset’8 029 9

VPN’31

PFN’

VPN’*4

Physical addrof PTE

PFN

Offset8 029 9

PFN

Page 27: Computer Architecture 2008 – VM 1 Computer Architecture Virtual Memory Dr. Lihu Rappoport.

Computer Architecture 2008 – VM27

P0 space Address translation Using TLB

Yes

No

ProcessTLB Access

ProcessTLB hit?

00 VPN offset

NoSystemTLB hit?

Yes

Get PTE of req page from the proc. TLB

Calculate PTE virtual addr (in S0): P0BR+4*VPN

System TLB Access

Get PTE from system TLB

Get PTE of req page from the process Page table

Access Sys Page Table inSBR+4*VPN(PTE)

Memory Access

Calculate physical address

PFN

PFN

Access Memory

Page 28: Computer Architecture 2008 – VM 1 Computer Architecture Virtual Memory Dr. Lihu Rappoport.

Computer Architecture 2008 – VM28

Paging in x86 2-level hierarchical mapping

– Page directory and page tables

– All pages and page tables are 4K

Linear address divided to:– Dir 10 bits

– Table 10 bits

– Offset 12 bits

– Dir/Table serves as indexinto a page table

– Offset serves ptr into adata page

Page entry points to a

page table or page Performance issues: TLB

031

DIR TABLE OFFSET

Page Table

PG Tbl Entry

Page Directory

4K Dir Entry

4K Page Frame

Operand

Linear Address Space (4K Page)1121

CR3

Page 29: Computer Architecture 2008 – VM 1 Computer Architecture Virtual Memory Dr. Lihu Rappoport.

Computer Architecture 2008 – VM29

x86 Page Translation Mechanism CR3 points to current page directory (may be changed per process)

Usually, a page directory entry (covers 4MB) points to a page table that covers data of the same type/usage

Can allocate different physical for same Linear (e.g. 2 copies of same code)

Sharing can alias pages from diff. processes to same physical (e.g., OS)

DIR TABLE OFFSET

Page Dir Code

Data

StackOS

Phys Mem

4K page

CR3

Page Tables

Page 30: Computer Architecture 2008 – VM 1 Computer Architecture Virtual Memory Dr. Lihu Rappoport.

Computer Architecture 2008 – VM30

x86 Page Entry Format

20 bit pointer to a 4K Aligned address

12 bits flags Virtual memory

– Present

– Accessed, Dirt

Protection– Writable (R#/W)

– User (U/S#)

– 2 levels/type only

Caching– Page WT

– Page Cache Disabled

3 bit for OS usage

Figure 11-14. Format of Page Directory and Page Table Entries for 4K Pages

0

0 0

Page Frame Address 31:12 AVAIL 0 0 APCD

PWT

U W P

PresentWritableUserWrite-ThroughCache DisableAccessedPage Size (0: 4 Kbyte)Available for OS Use

Page DirEntry

04 12357911 681231

Page Frame Address 31:12 AVAIL D APCD

PWT

U W P

PresentWritableUserWrite-ThroughCache DisableAccessedDirtyAvailable for OS Use

Page TableEntry

04 12357911 681231

Reserved by Intel for future use (should be zero)

-

-

Page 31: Computer Architecture 2008 – VM 1 Computer Architecture Virtual Memory Dr. Lihu Rappoport.

Computer Architecture 2008 – VM31

x86 Paging – Virtual memory

A page can be– Not yet loaded

– Loaded

– On disk

A loaded page can be– Dirty

– Clean

When a page is not loaded (P bit clear) => Page fault occurs– It may require throwing a loaded page to insert the new one

OS prioritize throwing by LRU and dirty/clean/avail bits Dirty page should be written to Disk. Clean need not.

– New page is either loaded from disk or “initialized”

– CPU will set page “access” flag when accessed, “dirty” when written

Page 32: Computer Architecture 2008 – VM 1 Computer Architecture Virtual Memory Dr. Lihu Rappoport.

Computer Architecture 2008 – VM32

Virtually-Addressed Cache Cache uses virtual addresses (tags are virtual)

Only require address translation on cache miss– TLB not in path to cache hit

Aliasing: 2 different virtual addr. mapped to same physical addr– Two different cache entries holding data for the same physical address

– Must update all cache entries with same physical address

data

Trans-lation

Cache

MainMemory

VA

hit

PA

CPU

Page 33: Computer Architecture 2008 – VM 1 Computer Architecture Virtual Memory Dr. Lihu Rappoport.

Computer Architecture 2008 – VM33

Virtually-Addressed Cache (cont).

Cache must be flushed at task switch

– Solution: include process ID (PID) in tag

How to share memory among processes

– Permit multiple virtual pages to refer to same physical frame

Problem: incoherence if they point to different

physical pages

– Solution: require sufficiently many common virtual LSB

– With direct mapped cache, guarantied that they all point to same

physical page

Page 34: Computer Architecture 2008 – VM 1 Computer Architecture Virtual Memory Dr. Lihu Rappoport.

Computer Architecture 2008 – VM34

Backup

Page 35: Computer Architecture 2008 – VM 1 Computer Architecture Virtual Memory Dr. Lihu Rappoport.

Computer Architecture 2008 – VM35

Inverted Page Tables

V.Page P. FramehashVirtual

Page

=

IBM System 38 (AS400) implements 64-bit addresses.

48 bits translated

start of object contains a 12-bit tag

=> TLBs or virtually addressed caches are critical

Page 36: Computer Architecture 2008 – VM 1 Computer Architecture Virtual Memory Dr. Lihu Rappoport.

Computer Architecture 2008 – VM36

Hardware / Software Boundary

What aspects of the Virtual → Physical Translation is determined in hardware?

TLB Format Type of Page Table Page Table Entry Format Disk Placement Paging Policy

Page 37: Computer Architecture 2008 – VM 1 Computer Architecture Virtual Memory Dr. Lihu Rappoport.

Computer Architecture 2008 – VM37

Why virtual memory? Generality

– ability to run programs larger than size of physical memory

Storage management– allocation/deallocation of variable sized blocks is costly and leads to

(external) fragmentation

Protection– regions of the address space can be R/O, Ex, . . .

Flexibility– portions of a program can be placed anywhere, without relocation

Storage efficiency– retain only most important portions of the program in memory

Concurrent I/O– execute other processes while loading/dumping page

Expandability– can leave room in virtual address space for objects to grow.

Performance

Page 38: Computer Architecture 2008 – VM 1 Computer Architecture Virtual Memory Dr. Lihu Rappoport.

Computer Architecture 2008 – VM38

Address Translation with a TLB

virtual addressvirtual page number page offset

physical address

n–1 0p–1p

valid physical page numbertag

valid tag data

data=

cache hit

tag byte offsetindex

=

TLB hit

TLB

Cache

. ..