1 Virtual Memory. 2 Outline Pentium/Linux Memory System Core i7 Suggested reading: 9.6, 9.7.
-
Upload
judith-strickland -
Category
Documents
-
view
222 -
download
0
Transcript of 1 Virtual Memory. 2 Outline Pentium/Linux Memory System Core i7 Suggested reading: 9.6, 9.7.
3
bus interface unit
DRAMexternal
system bus (e.g. PCI)
instruction
fetch unit
L1
i-cache
L2
cache
cache bus
L1
d-cache
inst
TLB
data
TLB
processor package
• 32 bit address space• 4 KB page size• L1, L2, and TLBs
• 4-way set associative• inst TLB
• 32 entries• 8 sets
• data TLB• 64 entries• 16 sets
• L1 i-cache and d-cache• 16 KB• 32 B line size• 128 sets
• L2 cache• unified• 128 KB -- 2 MB• 32 B line size
P6 Memory System
4
Review of Address Translation
• Basic Parameters– N = 2n = Virtual address limit– M = 2m = Physical address limit– P = 2p = page size (bytes).
• Components of the virtual address (VA)– VPO: Virtual page offset – VPN: Virtual page number
• Components of the physical address (PA)– PPO: Physical page offset (same as VPO)– PPN: Physical page number
5
Review of Address Translation
virtual page number page offset virtual address
physical page number page offset physical address
0p–1
address translation
pm–1
n–1 0p–1p
TLB tag TLB index
Cache tag Cache index Cache offset
6
Review of Address Translation
• Components of the virtual address (VA)– VPO: Virtual page offset – VPN: Virtual page number – TLBI: TLB index– TLBT: TLB tag
• Components of the physical address (PA)– PPO: Physical page offset (same as VPO)– PPN: Physical page number– CO: Byte offset within cache line– CI: Cache index– CT: Cache tag
7
Multi-Level Page Tables
• multi-level page tables
• e.g., 2-level page table
– Level 1 tableLevel 1 table: 1024
entries, each of which
points to a Level 2 page
table.
– Level 2 tableLevel 2 table: 1024
entries, each of which
points to a page
Level 1
Table
...
Level 2
Tables
8
CPU
VPN VPO20 12
TLBT TLBI416
virtual address (VA)
...TLB (16 sets, 4 entries/set)VPN1 VPN2
1010
PDE PTE
PDBR
PPN PPO20 12
Page tables
TLB
miss
TLB
hit
physical
address (PA)
result32
...
CT CO20 5
CI7
L2 and DRAM
L1 (128 sets, 4 lines/set)
L1
hit
L1
miss
P6 Address Translation
9
• Page directory – 1024 4-byte page directory
entries (PDEs) that point to page tables
– one page directory per process.– page directory must be in
memory when its process is running
– always pointed to by PDBR
• Page tables:– 1024 4-byte page table entries
(PTEs) that point to pages.– page tables can be paged in and
out.
page directory
...
Up to 1024 page tables
1024PTEs
1024PTEs
1024PTEs
...
1024PDEs
P6 Page Table
10
Page table physical base addr Avail G PS A CD WT U/S R/W P=1
Page table physical base address: 20 most significant bits of physical page table address (forces page tables to be 4KB aligned)
Avail: available for system programmers
G: global page (don’t evict from TLB on task switch)
PS: page size 4K (0) or 4M (1)
A: accessed (set by MMU on reads and writes, cleared by software)
CD: cache disabled (1) or enabled (0)
WT: write-through or write-back cache policy for this page table
U/S: user or supervisor mode access
R/W: read-only or read-write access
P: page table is present in memory (1) or not (0)
31 12 11 9 8 7 6 5 4 3 2 1 0
Available for OS (page table location in secondary storage) P=0
31 01
P6 page directory entry (PDE)
11
Page physical base address Avail G 0 D A CD WT U/S R/W P=1
Page base address: 20 most significant bits of physical page address (forces pages to be 4 KB aligned)
Avail: available for system programmers
G: global page (don’t evict from TLB on task switch)
D: dirty (set by MMU on writes)
A: accessed (set by MMU on reads and writes)
CD: cache disabled or enabled
WT: write-through or write-back cache policy for this page
U/S: user/supervisor
R/W: read/write
P: page is present in physical memory (1) or not (0)
31 12 11 9 8 7 6 5 4 3 2 1 0
Available for OS (page location in secondary storage) P=0
31 01
P6 page table entry (PTE)
12
PDE
PDBRphysical address
of page table base
(if P=1)
physical
address
of page base
(if P=1)physical address
of page directory
word offset into
page directory
word offset into
page table
page directory page table
VPN1
10VPO
10 12VPN2 Virtual address
PTE
PPN PPO
20 12Physical address
word offset into
physical and virtual
page
Page tables Translation
13
CPU
VPN VPO20 12
TLBT TLBI416
virtual address (VA)
...TLB (16 sets, 4 entries/set)VPN1 VPN2
1010
PDE PTE
PDBR
PPN PPO20 12
Page tables
TLB
miss
TLB
hit
physical
address (PA)
result32
...
CT CO20 5
CI7
L2 and DRAM
L1 (128 sets, 4 lines/set)
L1
hit
L1
miss
P6 TLB Translation
14
• TLB entry (not all documented, so this is speculative):
– V: indicates a valid (1) or invalid (0) TLB entry– PD: is this entry a PDE (1) or a PTE (0)?– tag: disambiguates entries cached in the same set– PDE/PTE: page directory or page table entry
• Structure of the data TLB:– 16 sets, 4 entries/set
PDE/PTE Tag PD V
1 11632
entry entry entry entryentry entry entry entryentry entry entry entry
entry entry entry entry
...
set 0set 1set 2
set 15
P6 TLB
15
1. Partition VPN into TLBT
and TLBI.
2. Is the PTE for VPN
cached in set TLBI?
3. Yes: then build
physical address.
4. No: then read PTE
(and PDE if not
cached) from memory
and build physical
address.
CPU
VPN VPO20 12
TLBT TLBI416
virtual address
PDE PTE
...TLB
miss
TLB
hit
page table translation
PPN PPO20 12
physical address
1 2
3
4
Translating with the P6 TLB
16
CPU
VPN VPO20 12
TLBT TLBI416
virtual address (VA)
...TLB (16 sets, 4 entries/set)VPN1 VPN2
1010
PDE PTE
PDBR
PPN PPO20 12
Page tables
TLB
miss
TLB
hit
physical
address (PA)
result32
...
CT CO20 5
CI7
L2 and DRAM
L1 (128 sets, 4 lines/set)
L1
hit
L1
miss
P6 Page Table Translation
17
• Case 1/1: page table and page present.
• MMU Action: – MMU build
physical address and fetch data word.
• OS action
– none
VPN
VPN1 VPN2
PDE
PDBR
PPN PPO20 12
20VPO
12
p=1 PTE p=1
Data page
data
Page directory
Page table
Mem
Disk
Translating with the P6 page tables (case 1/1)
18
• Case 1/0: page table present but page missing.
• MMU Action: – page fault
exception– handler receives
the following args:
• VA that caused fault
• fault caused by non-present page or page-level protection violation
• read/write• user/supervisor
VPN
VPN1 VPN2
PDE
PDBR
20VPO
12
p=1 PTE
Page directory
Page table
Mem
DiskData page
data
p=0
Translating with the P6 page tables (case 1/0)
19
• OS Action: – Check for a legal
virtual address.– Read PTE through
PDE. – Find free physical
page (swapping out current page if necessary)
– Read virtual page from disk and copy to virtual page
– Restart faulting instruction by returning from exception handler.
VPN
VPN1 VPN2
PDE
PDBR
20VPO
12
p=1 PTE p=1
Page directory
Page table
Data page
data
PPN PPO20 12
Mem
Disk
Translating with the P6 page tables (case 1/0, cont)
20
• Case 0/1: page table missing but page present.
• Introduces consistency issue.
– potentially every page out requires update of disk page table.
• Linux disallows this
– if a page table is swapped out, then swap out its data pages too.
VPN
VPN1 VPN2
PDE
PDBR
20VPO
12
p=0
PTE p=1
Page directory
Page table
Mem
Disk
Data page
data
Translating with the P6 page tables (case 0/1)
21
• Case 0/0: page
table and page
missing.
• MMU Action:
– page fault
exception
VPN
VPN1 VPN2
PDE
PDBR
20VPO
12
p=0
PTE
Page directory
Page table
Mem
DiskData page
datap=0
Translating with the P6 page tables (case 0/0)
22
• OS action:
– swap in page
table.
– restart faulting
instruction by
returning from
handler.
• Like case 0/1 from
here on.
VPN
VPN1 VPN2
PDE
PDBR
20VPO
12
p=1 PTE
Page directory
Page table
Mem
DiskData page
data
p=0
Translating with the P6 page tables (case 0/0, cont)
23
CPU
VPN VPO20 12
TLBT TLBI416
virtual address (VA)
...TLB (16 sets, 4 entries/set)VPN1 VPN2
1010
PDE PTE
PDBR
PPN PPO20 12
Page tables
TLB
miss
TLB
hit
physical
address (PA)
result32
...
CT CO20 5
CI7
L2 and DRAM
L1 (128 sets, 4 lines/set)
L1
hit
L1
miss
P6 L1 Cache Access
24
• Partition physical address into CO, CI, and CT.
• Use CT to determine if line containing word at address PA is cached in set CI.
• If no: check L2.• If yes: extract word
at byte offset CO and return to processor.
physical
address (PA)
data32
...
CT CO20 5
CI7
L2 and DRAM
L1 (128 sets, 4 lines/set)
L1
hit
L1
miss
L1 cache access
25
• Core i7– The 64-bit Nehalem microarchitecture
– Current support 48-bit (256 TB) virtual address space and 52-bit (4 PB) physical address space
– Compatibility mode support 32-bit (4 GB) virtual and physical address spaces
Core i7 Summary
26
Pro
cessor
packag
eCore i7 Memory System
Core x 4
RegistersRegisters Instruction fetch
Instruction fetch
L1 d-cache 32 KB,8-wayL1 d-cache
32 KB,8-wayL1 i-cache
32 KB,8-wayL1 i-cache
32 KB,8-way
L2 unified cache256 KB, 8-way
L2 unified cache256 KB, 8-way
L3 unified cache8 MB, 16-way
(shared by all cores)
L3 unified cache8 MB, 16-way
(shared by all cores)
MMU(addr translation)
MMU(addr translation)
L1 d-TLB 64 entries, 4-way
L1 d-TLB 64 entries, 4-way
L1 i-TLB 64 entries, 4-way
L1 i-TLB 64 entries, 4-way
L2 unified TLB 512 entries, 4-way
L2 unified TLB 512 entries, 4-way
QuickPath interconnect4 links @ 25.6 GB/s
102.4 GB/s total
QuickPath interconnect4 links @ 25.6 GB/s
102.4 GB/s total
DDR3 memory controller3 x 64 bit @ 10.66 GB/s
32 GB/s total (shared by all cores)
DDR3 memory controller3 x 64 bit @ 10.66 GB/s
32 GB/s total (shared by all cores)
Main memoryMain memory
To othercores
To I/Obridge
...
27
...
CPUCPU
VPNVPN VPOVPO36 12
432
Virtual address (VA)
L1 TLB (16 sets, 4 entries/set)
PPNPPN PPOPPO40 12
TLBmiss
TLB hit
physical
address (PA)
resultresult
32/64
CTCT40 66
L2, L3, and main memoryL2, L3, and
main memory
L1 d-cache(64 sets, 8 lines/set)
L1 hitL1 miss
TLBTTLBT TLBITLBI
Core i7 Address Translation
CICI COCOVPN1VPN1 VPN2VPN2 VPN3VPN3 VPN4VPN4
PTEPTE PTE PTE
CR3
Page Tables
9 9 9 9
28
Page table physical base addr Unused G PS A CD WT U/S R/WP=1
XD Disable or enable instruction fetches
Base addr 40 most significant bits of base address of child page table (forces page tables to be 4KB aligned)
G global page (don’t evict from TLB on task switch)
PS Page size either 4K(0) or 4M(1) (defined for Level 1 PTEs only)
A Reference bit (set by MMU on reads and writes, cleared by software)
CD Cache disabled(1) or enabled(0) for child page table
WT Write-through or write-back cache policy
U/S User or supervisor(kernel) mode access permission
R/W Read-only or read-write access permissiom
P Child page table present in memory(1) or not(0)
51 1211 9 8 7 6 5 4 3 2 1 0
Available for OS (page table location in secondary storage) P=0
Level 1, Level 2 and Level 3 Page Table Entry
Unused
52
XD
6263
29
Page table physical base addr Unused G D A CD WT U/S R/WP=1
XD Disable or enable instruction fetches
Base addr 40 most significant bits of base address of child page table (forces page tables to be 4KB aligned)
G global page (don’t evict from TLB on task switch)
D Dirty bit (Set by MMU on writes, cleared by software)
A Reference bit (set by MMU on reads and writes, cleared by software)
CD Cache disabled(1) or enabled(0) for child page table
WT Write-through or write-back cache policy
U/S User or supervisor(kernel) mode access permission
R/W Read-only or read-write access permissiom
P Child page table present in memory(1) or not(0)
51 1211 9 8 7 6 5 4 3 2 1 0
Available for OS (page table location in secondary storage) P=0
Level 4 Page Table Entry
Unused
52
XD
6263
30
Physical addressof L1 PT
physical addressof page512 GB
regionper entry
L1 PTPage global
directory
12
Virtual address
40 12 Physical address
Offset into physical and virtual page
Page tables Translation
VPN 1 VPN 2 VPN 3 VPN 4 VPO
9999
PPO
L1 PTE
CR3
1 GBregion
per entry
L2 PTPage upper
directory
L2 PTE
2 MBregion
per entry
L3 PTPage middle
directory
L3 PTE
4 KBregion
per entry
L4 PTPage
directory
L4 PTE
PPO
12
40
9999
40404040
Cute Trick for Speeding Up L1 Access
• Observation– Bits that determine CI identical in virtual and physical address– Can index into cache while address translation taking place– Generally we hit in TLB, so PPN bits (CT bits) available next– “Virtually indexed, physically tagged”– Cache carefully sized to make this possible
Physical address
(PA)
CT CO36 6
CI6
Virtual address
(VA) VPN VPO
36 12
PPOPPN
AddressTranslation
NoChange
CI
L1 Cache
CT Tag Check
32
Process-specific data structures(e.g. task and mm structs,page tables, kernel stack)
Process-specific data structures(e.g. task and mm structs,page tables, kernel stack)
Physical memoryPhysical memory
Kernel code and dataKernel code and data
User stackUser stack
Memory mapped regionfor shared libraries
Memory mapped regionfor shared libraries
Run-time heap (via malloc)Run-time heap (via malloc)
Uninitialized data (.bss)Uninitialized data (.bss)
Initialized data (.data)Initialized data (.data)
Program text (.text)Program text (.text)
Kernel virtual memory
Processvirtualmemory
Different foreach process
Identical foreach process
%esp
brk
x08048000 (32)x40000000 (64)
Linux Virtual Memory System