Effects of wrong path mem. ref. in CC MP Systems Gökay Burak AKKUŞ Cmpe 511 – Computer Architecture.
CMPE 421 Advanced Computer Architecture
-
Upload
rowan-holden -
Category
Documents
-
view
46 -
download
0
description
Transcript of CMPE 421 Advanced Computer Architecture
2
Other Cache organizations
Direct MappedDirect Mapped
0:1:23:4:5:6:7:89:
10:11:12:13:14:15:
V Tag DataIndexIndex
Address = Tag | Index | Block offset
Fully AssociativeFully Associative
No IndexNo Index
Address = Tag | Block offset
Each address has only one possible location
Each address has only one possible location
Tag DataV
4
A Compromise
2-Way set associative2-Way set associative
Address = Tag | Index | Block offset
4-Way set associative4-Way set associative
Address = Tag | Index | Block offset
0:
1:
2:
3:
4:
5:
6:
7:
V Tag Data
Each address has two possiblelocations with the same index
Each address has two possiblelocations with the same index
One fewer index bit: 1/2 the indexes
One fewer index bit: 1/2 the indexes
0:
1:
2:
3:
V Tag Data
Each address has four possiblelocations with the same index
Each address has four possiblelocations with the same index
Two fewer index bits: 1/4 the indexes
Two fewer index bits: 1/4 the indexes
5
Range of Set Associative Caches
Block offset Byte offsetIndexTag
Decreasing associativity
Fully associative(only one set)Tag is all the bits exceptblock and byte offset
Direct mapped(only one way)Smaller tags
Increasing associativity
Selects the setUsed for tag compare Selects the word in the block
6
Set Associative Cache
0
Cache
Main Memory
Q1: How do we find it?
Use next 1 low order memory address bit to determine which cache set (i.e., modulo the number of sets in the cache)
Tag Data
Q2: Is it there?
Compare all the cache tags in the set to the high order 3 memory address bits to tell if the memory block is in the cache
V
0000xx0001xx0010xx0011xx0100xx0101xx0110xx0111xx1000xx1001xx1010xx1011xx1100xx1101xx1110xx1111xx
Two low order bits define the byte in the word (32-b words)One word blocks
Set
1
01
Way
0
1
(block address) modulo (# set in the cache)
7
Set Associative Cache Organization
FIGURE 7.17 The implementation of a four-way set-associative cache requires four comparators and a 4-to-1 multiplexor. The comparators determine which element of the selected set (if any) matches the tag. The output of the comparators is used to select the data from one of the four blocks of the indexed set, using a multiplexor with a decoded select signal. In some implementations, the Output enable signals on the data portions of the cache RAMs can be used to select the entry in the set that drives the output. The Output enable signal comes from the comparators, causing the element that matches to drive the data outputs.
8
Remember the Example for Direct Mapping (ping pong effect)
0 4 0 4
0 4 0 4
Consider the main memory word reference string 0 4 0 4 0 4 0 4
miss miss miss miss
miss miss miss miss
00 Mem(0) 00 Mem(0)01 4
01 Mem(4)000
00 Mem(0)01
4
00 Mem(0)01 4
00 Mem(0)01
401 Mem(4)
00001 Mem(4)
000
Start with an empty cache - all blocks initially marked as not valid
Ping pong effect due to conflict misses - two memory locations that map into the same cache block
8 requests, 8 misses
9
Solution: Use set associative cache
0 4 0 4
Consider the main memory word reference string 0 4 0 4 0 4 0 4
miss miss hit hit
000 Mem(0) 000 Mem(0)
Start with an empty cache - all blocks initially marked as not valid
010 Mem(4) 010 Mem(4)
000 Mem(0) 000 Mem(0)
010 Mem(4)
Solves the ping pong effect in a direct mapped cache due to conflict misses since now two memory locations that map into the same cache set can co-exist!
8 requests, 2 misses
10
Set Associative Example
V Tag DataIndex00000000
000:
001:
010:
011:
100:
101:
110:
111:
01001110001100110100010011110001101100001100111000
MissMissMissMissMiss
Index V Tag Data
0
0000000
00:
01:
10:
11:
V Tag DataIndex0
0000000
0:
1:
Direct-Mapped 2-Way Set Assoc. 4-Way Set Assoc.
01001110001100110100010011110001101100001100111000
MissMissHitMissMiss
01001110001100110100010011110001101100001100111000
MissMissHitMissHit
Byte offset (2 bits)Block offset (2 bits)Index (1-3 bits)Tag (3-5 bits)
010 -1 110010
0100 -
1 1100 -1
011110
01101100
1 01001
1 11001
1 01101
-
--
11
New Performance Numbers
Miss rates for DEC 3100 (MIPS machine)
spice Direct 0.3% 0.6% 0.4%
gcc Direct 2.0% 1.7% 1.9%
spice 2-way 0.3% 0.6% 0.4%
gcc 4-way 1.6% 1.4% 1.5%
Benchmark Associativity Instruction Data miss Combinedrate miss rate
Separate 64KB Instruction/Data Caches
gcc 2-way 1.6% 1.4% 1.5%
spice 4-way 0.3% 0.6% 0.4%
12
Benefits of Set Associative Caches The choice of direct mapped or set associative depends
on the cost of a miss versus the cost of implementation
0
2
4
6
8
10
12
1-way 2-way 4-way 8-way
Associativity
Mis
s R
ate
4KB8KB16KB32KB64KB128KB256KB512KB
Data from Hennessy & Patterson, Computer Architecture, 2003
Largest gains are in going from direct mapped to 2-way (20%+ reduction in miss rate)
Virtual Memory (32-bit system): 8KB page size,16MB Mem
Phys. Page # Disk AddressVirt.Pg.# V
012
512K
...
...
0121331
Index1319
Virtual AddressVirtual Address
Page offset
0121323Physical AddressPhysical Address
4GB / 8KB =512K entries
219=512K
11
Virtual memory example
Virtual Page # Valid Physical Page #/(index) Bit Disk address000000 1 1001000001 0 sector 5000...000010 1 0010000011 0 sector 4323…000100 1 1011000101 1 1010000110 0 sector 1239...000111 1 0001
Page Table:
System with 20-bit V.A., 16KB pages, 256KB of physical memory
Page offset takes 14 bits, 6 bits for V.P.N. and 4 bits for P.P.N.
Access to:0000 1000 1100 1010 1010
PPN = 0010
Physical Address: 00 1000 1100 1010 1010
Access to:0001 1001 0011 1100 0000
PPN = Page Fault tosector 1239...
Pick a page to “kick out” of memory (use LRU).
Assume LRU is VPN 000101 for this example.
01 1010
sector xxxx...
Read data from sector 1239into PPN 1010