VM Design Issues Vivek Pai / Kai Li Princeton University.

32
VM Design Issues Vivek Pai / Kai Li Princeton University
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    1

Transcript of VM Design Issues Vivek Pai / Kai Li Princeton University.

VM Design Issues

Vivek Pai / Kai LiPrinceton University

2

Mini-Gedankenexperimenten

What’s the refresh rate of your monitor? What is the access time of a hard drive? What response time determines

sluggishness or speediness? What’s the relation?

What determines the running speed of a program that’s paging heavily?

If you have a program that pages heavily, what are your options to improve the situation?

3

Mechanics

Let’s finish off last lecture Memory mapping, Unified VM next

time No assigned reading yet, may not exist

Mid-term on track Covers everything before it

Open Q&A session? Is there interest? If so, when?

4

Where We Left Off Last Time

Various approaches to evicting pages Some discussion about why doing

even “well” is hard to implement Belady’s algorithm for off-line

analysis We just finished variations on FIFO

In particular, enhanced FIFO with 2nd chance

5

Lessons From Enhanced FIFO

Observation: it’s easier to evict a clean page than a dirty page

2nd observation: sometimes the disk and CPU are idle

Optimization: when system’s free, write dirty pages back to disk, but don’t evict

Called flushing – often falls to pager daemon

6

Least Recently Used (LRU)

Algorithm Replace page that hasn’t been used

for the longest time Question

What hardware mechanisms required to implement LRU?

7

Implementing LRU

Perfect Use a timestamp on each reference Keep a list of pages ordered by time of

reference

5 3 4 7 9 11 2 1 15

Mostly recently used

Leastrecently used

8

Approximate LRU

Most recently used Least recently used

N categories

pages in order of last reference

LRU

CrudeLRU

2 categories

pages referenced since the last page fault

pages not referenced since the last page fault

. . . 2552540 1 2 38-bitcount

256 categories

9

Aging: Not Frequently Used (NFU)

Algorithm Shift reference bits into counters Pick the page with the smallest counter

Main difference between NFU and LRU? NFU has a short history (counter length)

How many bits are enough? In practice 8 bits are quite good

Pros: Require one reference bit Cons: Require looking at all counters

00000000

00000000

10000000

00000000

10000000

00000000

11000000

00000000

01000000

10000000

11100000

00000000

10100000

01000000

01110000

10000000

01010000

10100000

00111000

01000000

10

Where Do We Get Storage?

32 bit VA to 32 bit PA – no space, right? Offset within page is the same

No need to store offset 4KB page = 12 bits of offset Those 12 bits are “free” in PTE

Page # + other info <= 32 bits Makes storing info easy

11

x86 Page Table Entry

Valid

Writable

Owner (user/kernel)

Write-through

Cache disabled

Accessed (referenced)

Dirty

PDE maps 4MB

Global

Page frame number DLGlCwPU A Cd Wt O W V

Reserved31 12

12

What Happens on Diagonal Lines

My screen is 1024*768 pixels 256 colors = 1 byte per pixel = .75MB 64K colors = 2 bytes/pixel = 1.5MB Page size is 4KB Screen is 192 or 384 pages

1 page = several horizontal lines Diagonal/vertical lines = TLB

badness “Superpages” to the rescue

13

The Big Picture

We’ve talked about single evictions Most computers are

multiprogrammed Single eviction decision still needed New concern – allocating resources How to be “fair enough” and achieve

good overall throughput This is a competitive world – local and

global resource allocation decisions

14

Program Behaviors

80/20 rule > 80% memory

references are made by < 20% of code

Locality Spatial and temporal

Working set Keep a set of pages in

memory would avoid a lot of page faults

# pages in memory#

page

fau

lts

Working set

15

Observations re Working Set

Working set isn’t static There often isn’t a single “working

set” Multiple plateaus in previous curve Program coding style affects working set

Working set is hard to gauge What’s the working set of an interactive

program?

16

Working Set

Main idea Keep the working set in memory

An algorithm On a page fault, scan through all pages of the

process If the reference bit is 1, record the current time for

the page If the reference bit is 0, check the “last use time”

If the page has not been used within , replace the page Otherwise, go to the next

Add the faulting page to the working set

17

WSClock Paging Algorithm

Follow the clock hand If the reference bit is 1, set reference bit to 0,

set the current time for the page and go to the next

If the reference bit is 0, check “last use time” If page has been used within , go to the next If page hasn’t been used within and modify bit is 1

Schedule the page for page out and go to the next If page hasn’t been used within and modified bit is 0

Replace this page

18

Simulating Modify Bit with Access Bits

Set pages read-only if they are read-write

Use a reserved bit to remember if the page is really read-only

On a read fault If it is not really read-only, then record a

modify in the data structure and change it to read-write

Restart the instruction

19

Implementing LRU without Reference Bit

Some machines have no reference bit VAX, for example

Use the valid bit or access bit to simulate Invalidate all valid bits (even they are valid) Use a reserved bit to remember if a page is

really valid On a page fault

If it is a valid reference, set the valid bit and place the page in the LRU list

If it is a invalid reference, do the page replacement Restart the faulting instruction

20

Demand Paging

Pure demand paging relies only on faults to bring in pages

Problems? Possibly lots of faults at startup Ignores spatial locality

Remedies Loading groups of pages per fault Prefetching/preloading

21

Speed and Sluggishness

Slow is > .1 seconds (100 ms) Speedy is << .1 seconds Monitors tend to be 60+ Hz =

<16.7ms between screen paints Disks have seek + rotational delay

Seek is somewhere between 7-16 ms At 7200rpm, one rotation = 1/120 sec =

8ms. Half-rotation is 4ms Conclusion? One disk access OK, six are

bad

22

Disk Address

Use physical memory as a cache for disk

Where to find a page on a page fault? PPage# field

is a disk address

Virtualaddress

spaceinvalid

Physicalmemory

23

Imagine a Global LRU

Global – across all processes Idea – when a page is needed, pick

the oldest page in the system Problems? Process mixes?

Interactive processes Active large-memory sweep processes

Mitigating damage?

24

Amdahl’s Law

Gene Amdahl (IBM, then Amdahl) Noticed the bottlenecks to speedup Assume speedup affects one

component New time =

(1-not affected) + affected/speedup In other words, diminishing returns

25

NT x86 Virtual Address Space Layouts

00000000

7FFFFFFF80000000

System cachePaged pool

Nonpaged pool

Kernel & execHAL

Boot drivers

Process page tablesHyperspace

Application codeGlobals

Per-thread stacksDLL code

3-GB user space

1-GB system space

BFFFFFFFC0000000

FFFFFFFF FFFFFFFF

C0000000

C0800000

26

Virtual Address Space in Win95 and Win98

00000000

7FFFFFFF80000000

Operating system(Ring 0 components)

Shared, process-writable(DLLs, shared memory,

Win16 applications)

Win95 and Win98

User accessible

FFFFFFFF

C0000000

Unique per process(per application),user mode

Systemwideuser mode

Systemwidekernel mode

27

Details with VM Management

Create a process’s virtual address space Allocate page table entries (reserve in NT) Allocate backing store space (commit in NT) Put related info into PCB

Destroy a virtual address space Deallocate all disk pages (decommit in NT) Deallocate all page table entries (release in

NT) Deallocate all page frames

28

Page States (NT)

Active: Part of a working set and a PTE points to it Transition: I/O in progress (not in any working sets) Standby: Was in a working set, but removed.

A PTE points to it, not modified and invalid.

Modified: Was in a working set, but removed. A PTE points to it, modified and invalid.

Modified no write: Same as modified but no write back

Free: Free with non-zero content Zeroed: Free with zero content Bad: hardware errors

29

Working setreplacement

Page in or allocationDemandzero fault

Dynamics in NT VM

Processworking

set

Standbylist

Modifiedlist

Modifiedwriter

“Soft”faults

Freelist

Zerothread

Zerolist

Badlist

30

Shared Memory

How to destroy a virtual address space? Link all PTEs Reference count

How to swap out/in? Link all PTEs Operation on all entries

How to pin/unpin? Link all PTEs Reference count

.

.

.

.

.

.

...

.

.

.

Process 1

Process 2

w

...

w

Page table

Page table

Physicalpages

31

.

.

.

.

.

.

...

.

.

.

Copy-On-Write

Child’s virtual address space uses the same page mapping as parent’s

Make all pages read-only Make child process

ready On a read, nothing

happens On a write, generates an

access fault map to a new page

frame copy the page over restart the instruction

Parent process

Child process

rr

...

rr

Page table

Page table

Physicalpages

32

Issues of Copy-On-Write

How to destroy an address space Same as shared memory case?

How to swap in/out? Same as shared memory

How to pin/unpin Same as shared memory