Caches Master
-
Upload
suyashi-purwar -
Category
Documents
-
view
231 -
download
0
Transcript of Caches Master
-
7/30/2019 Caches Master
1/51
An example memory hierarchy
registers
on-chip L1cache (SRAM)
main memory(DRAM)
local secondary storage(local disks)
Larger,slower,
andcheaper
(per byte)storagedevices
remote secondary storage(distributed file systems, Web servers)
Local disks hold filesretrieved from disks onremote network servers.
Main memory holds diskblocks retrieved from localdisks.
off-chip L2cache (SRAM)
L1 cache holds cache lines retrievedfrom the L2 cache.
CPU registers hold words retrievedfrom cache memory.
L2 cache holds cache linesretrieved from memory.
L0:
L1:
L2:
L3:
L4:
L5:
Smaller,faster,and
costlier(per byte)storagedevices
-
7/30/2019 Caches Master
2/51
Principle of Locality
Programs tend to re-use data and instructions near
those they have used recently
Two types of locality1. Temporal locality: items currently being used are
likely to be reused in the near future
2. Spatial locality: items with nearby addresses tend
to be referenced close together in time
Principle of Locality
-
7/30/2019 Caches Master
3/51
Are there things we can do in our code to
allow us to benefit from the principle of
locality?
Programs that reference the same variables
repeatedly have good locality
Programs with small strides in array
locations have good locality Programs with small loops with many
iterations have good locality
Programming to Maximize Benefits of
Locality
-
7/30/2019 Caches Master
4/51
Locality Exampleint sumvec(int v[N]) {
int i, sum = 0;
for (i = 0; i < n; i++)
sum += v[i];
return sum;
}
Locality: Data
Reference array elements in succession
(stride-1 reference pattern): Spatial locality
Reference sum each iteration: Temporal locality
InstructionsReference instructions in sequence: Spatial locality
Cycle through loop repeatedly: Temporal locality
-
7/30/2019 Caches Master
5/51
Locality Example
How is the locality of this program?
Note the elements are accessed in the orderthey are stored in memory This implies good locality
-
7/30/2019 Caches Master
6/51
Locality Example 2
Claim: Being able to look at code and get a qualitative sense of its
locality is a key skill for a professional programmer.
Question: Does this function have good locality?
-
7/30/2019 Caches Master
7/51
Locality Example 2
How is the locality of this program?
Recall arrays in C/C++ are stored row-wise This means that all of row 1 is stored contiguously, then
row 2, etc.
This leads to good locality
-
7/30/2019 Caches Master
8/51
Locality Example 3
.Question: Does this function have good locality?
Assume a 2x3 array for ease
int sum_array_cols(int a[M][N])
{ int i, j, sum = 0;
for (j = 0; j < N; j++)
for (i = 0; i < M; i++)
sum += a[i][j];
return sum;}
-
7/30/2019 Caches Master
9/51
Locality Example 3
How is the locality of this program ?
Recall arrays in C/C++ are stored row-wise But we are accessing the array by columns
This leads to very poor locality
-
7/30/2019 Caches Master
10/51
Locality Example 4 Question: Can you permute the loops so that the function scans the
3-d array a[ ] with a stride-1 reference pattern (and thus has good
spatial locality)?
int sumarray3d(int a [N] [N] [N]){
int i, j, k, sum = 0;
for (i = 0; i < N; i++)
for (j = 0; j < N; j++)
for (k = 0; k < N; k++)
sum += a[k][i][j];return sum;
}
-
7/30/2019 Caches Master
11/51
Locality Example 4 Recall: the best indexing for a 2-D array is to start at a row and
count across all the columns in a[i][j]
i.e. increment the j counter in the inner loop
So: for multi-dimensional loops, simply run the loops
in the order of the indices
int sumarray3d(int a [N] [N] [N]){
int i, j, k, sum = 0;
for (k = 0; k < N; k++)
for (i = 0; i < N; i++)
for (j = 0; j < N; j++)
sum += a[k][i][j];return sum;
}
-
7/30/2019 Caches Master
12/51
An example memory hierarchy
registers
on-chip L1cache (SRAM)
main memory(DRAM)
local secondary storage(local disks)
Larger,slower,
andcheaper
(per byte)storagedevices
remote secondary storage(distributed file systems, Web servers)
Local disks hold filesretrieved from disks onremote network servers.
Main memory holds diskblocks retrieved from localdisks.
off-chip L2cache (SRAM)
L1 cache holds cache lines retrievedfrom the L2 cache.
CPU registers hold words retrievedfrom cache memory.
L2 cache holds cache linesretrieved from memory.
L0:
L1:
L2:
L3:
L4:
L5:
Smaller,faster,and
costlier(per byte)storagedevices
-
7/30/2019 Caches Master
13/51
Definition of a Cache
The word cache is used throughout the memory
hierarchy diagram
What is a Cache?
A smaller, faster storage device that acts as astaging area for a subset of the data in a larger,
slower device.
Why would a cache help us at all?
It relies on the Principle of Locality
-
7/30/2019 Caches Master
14/51
Caching in the Memory Hierarchy
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
9 14 3Level k:
Level k+1:
Larger, slower, cheaper storagedevice at level k+1 is partitioned
into blocks.
Data is copied between
levels in block-sized transfer units
Smaller, faster, more expensive
memory caches a subset of
the blocks
4
4
10
14
It works because programs tend to access data at level k more often
than at k+1.
-
7/30/2019 Caches Master
15/51
Caching in the Memory
Hierarchy II
Goal:
The memory hierarchy provides a large pool
of memory that: Costs as much as the cheap storage near the
bottom (well, only a little bit more),
butserves data to programs at the rate of the faststorage near the top.
-
7/30/2019 Caches Master
16/51
General caching concepts I
Program needsobject d, which is
stored in block 14.
Block 14 is locatedin the level k cache
This is called a
Cache hit
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
9 14 3Level k:
Level k+1:
4
4
10
14
-
7/30/2019 Caches Master
17/51
General caching concepts II
Program needs objectb, which is stored inblock 12.
b is not at level k, solevel k cache must
fetch it from level k+1. This is called a Cache
miss
Simplest case is if
there is a spot for block12 in level k
What happens if level kis full?
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
9 14 3Level k:
Level k+1:
4
4
10
14
-
7/30/2019 Caches Master
18/51
General caching concepts III
Program needs object b, whichis stored in block 12.
b is not at level k, so level k
cache must fetch it from level
k+1.
level k cache is full must be replaced (evicted)
Which one is the victim?
Placement policy:
where can the new block
go? E.g., b mod 4
Replacement policy:
which block should be
evicted? E.g., LRU
-
7/30/2019 Caches Master
19/51
General Caching Concepts IV
Block A block is the smallest amount of memory that is
stored in a cache
The size of the block can be different for each stage
in the memory hierarchy
We rely on what kind of locality to ensure theadditional data in the block is needed?
Transfer Unit
These are the amount of data that we transfer intoand out of a cache when we need new data
A transfer unit is the same size as the block for thatstage
-
7/30/2019 Caches Master
20/51
General Caching Concepts V
Cache hit
Program finds b in the cache at level k.
Cache miss
b is not at level k, so level k cache must fetch it fromlevel k+1. E.g., next lower level in the hierarchy
Placement policy: where can the new block go?
If level k cache is full, then some current block must be
replaced Replacement policy: which block should be evicted?
-
7/30/2019 Caches Master
21/51
General Caching Concepts .
Cache Misses
Recall a cache miss is when the data wasnot in the cache
What can cause a block not to be in the cache
at a given time? There are three kinds of misses:
1. Cold (compulsory) miss
2. Conflict miss
3. Capacity miss
-
7/30/2019 Caches Master
22/51
General Caching Concepts -
Cache Misses II
Cold (compulsory) miss
Cold misses occur because the cache is empty
This happens when the machine starts-up
Capacity miss
Occurs when the set of active cache blocks (working
set) is larger than the cache
There is no longer enough room in the cache for all the dataso some block needs to be discarded.
-
7/30/2019 Caches Master
23/51
General Caching Concepts .
Cache Misses III
Conflict miss Most caches limit blocks at level k+1 to a small subset
(sometimes a singleton) of positions at level k.
E.g. Block i at level k+1 must be placed in block (i mod 4) atlevel k+1.
Conflict misses occur when multiple data objectsall map to the same level k block. E.g. Referencing blocks 0, 8, 0, 8, 0, 8, ... would miss
every time in a cache with 4 blocks where we store atlocation i mod 4.
-
7/30/2019 Caches Master
24/51
General Org of a Cache Memory
The cache contains sets of memory words called Blocks These blocks usually contain a few words of memory
Each block is part of a Line in the cache
The line contains identifying information on what is actually in theblocks and a status bit to say whether the data in the block is stillvalid
These lines are then grouped into Sets There may be 1 or more lines of cache data per set
The exact number of lines per set is determined by the type ofcache (more on that in a few slides)
-
7/30/2019 Caches Master
25/51
General Org of a Cache Memory
-
7/30/2019 Caches Master
26/51
Four Key Questions for
Designing a Cache1. Block Identification . How is a block found if it
is in the upper level ?
2. Block Placement - Where can a block be placed
in the upper level?
3. Block Replacement . Which block should bereplaced when there is a cache miss?
4. Write Strategy . What happens on a memorywrite?
-
7/30/2019 Caches Master
27/51
Q1: How is a block found if it is in the
cache? Blocks are maintained based on their block address
(assume it is a physical address in memory)
The address is divided into two parts:
The block address and The block offset (the LSBs that correspond to the
location of the desired data within the Block
The block address is further divided into two:
1. set index tells which Set
2. tag provides the remaining block address bits
-
7/30/2019 Caches Master
28/51
Decoding the Address for a
Cached Word The following stages are needed to find a word in a cache
Different parts of the actual word address provide these values
1. Read the set IDto see which set the word may be in
Called set select ion2. Read the tagto see if the word we want is actually in
this line of the cache
The tag is a type of hash that provides the word address (at
least part of it)
Called l ine matchingis line of the cache
-
7/30/2019 Caches Master
29/51
Decoding the Address for a
Cached Word II3. Read the val id bit
- Only want to read the desired word if the cache line is valid
4. Read the block of fsetvalue to know which wordin the line isthe word we want
- Called wo rd Extraction
-
7/30/2019 Caches Master
30/51
Decoding the Address for a
Cached Word
-
7/30/2019 Caches Master
31/51
Order of the Bits Used in the
Address
Remember that the bit fields in the actual wordaddress are used to access the cache A simple hash function uses these fields to access
the lines and individual words in the cache
Generally: a hash (hash value) is a non-uniqueaddress used to quickly access a table It is not unique because hash tables can only store
small subsets of a larger range of data values
The pigeon hole principle is in effect
The order of the bit field selection is critical toeffective performance
-
7/30/2019 Caches Master
32/51
Order of the Bits Used in the
Address II First, the middle bits are used for set selection
Why does this make sense?
Its undesirable for nearby words in memory to map to the same line
in the cache
It will cause the cache to replace possibly useful data too often
-
7/30/2019 Caches Master
33/51
Order of the Bits Used in the
Address III
It would go against our knowledge of spatial
Locality
Here we want consecutive words in different lines ofthe cache
This way, when the program accesses the next
data word in memory, the cache wont need to
discard the last one. Instead, the next cache
line is accessed, so the cache keeps them both.
-
7/30/2019 Caches Master
34/51
Why Use the Middle Bits as an
Index?
High-Order Bit Indexing
Adjacent memory lines would mapto same cache
Entry Poor use of spatial locality
Middle-Order Bit Indexing
Consecutive memory lines map todifferent cache lines
(Least significant bits are not
shown here)
-
7/30/2019 Caches Master
35/51
Order of the Bits Used in the
Address IV Second, the top (most significant) bits are used for line
matching (tag matching) Why does this make sense?
If the process got this far, then the middle order bitsalready match Again, this is using knowledge of spatial locality
-
7/30/2019 Caches Master
36/51
Order of the Bits Used in the
Address V If the set index bits match, then the higher order bits need to
be checked
These are only the same if the address is from the same
higher order bit region of memory
With locality and the limited cache size, we know that thetag will indicate whether we are dealing with the same
words (the same block)
-
7/30/2019 Caches Master
37/51
Order of the Bits Used in the
Address VI
Third (finally), the low order bits are used for
word extraction
Why does this make sense?
These are the most unique identifiers for a word in memory Another word with the same low order bits will be located far
away in memory
Therefore, according to the principle of locality, it is unlikely
that another word in memory with the same LSBs will be
needed near the same time
-
7/30/2019 Caches Master
38/51
Four Key Questions for
Designing a Cache1. Block Identification . How is a block found if it
is in the upper level ?
2. 2. Block Placement - Where can a block be placed
in the upper level?
3. Block Replacement . Which block should bereplaced when there is a cache miss?
4. Write Strategy . What happens on a memorywrite?
-
7/30/2019 Caches Master
39/51
Q2: Where Can a Block be
Placed in a Cache?
. There are three approaches to blockplacement: 1. Direct Mapped
The block can only be placed in a specific location in the
cache 2. Set-Assoc iative
A set is a group of blocks in the cache
The incoming block can only be placed in a specific set ofblocks
The block can be placed anywhere within this set
3. Ful ly Associat ive The block can be placed anywhere in the cache
-
7/30/2019 Caches Master
40/51
Mapping Memory Blocks into the
Cache
This is really a continuum of associativeproperties:
Direct mapped can be considered 1-way set
associative (the block has only 1 alloweddestination)
For N-way set associative, the block has Nplaces it can be placed
Fully associative can be considered m-wayset associative (where m is the number ofblocks), since the block can go anywhere
-
7/30/2019 Caches Master
41/51
Mapping Memory Blocks into the
Cache II
-
7/30/2019 Caches Master
42/51
Direct-Mapped Cache
Simplest kind of cache
Characterized by exactly one line per set.
Therefore, the tag and set index are
somewhat redundant The tag is insurance that the wrong word is not
used
-
7/30/2019 Caches Master
43/51
Accessing Direct-Mapped
Caches I
. Step 1: Set selection
identical to direct-mapped cache
-
7/30/2019 Caches Master
44/51
Accessing Direct-Mapped
Caches II
Step 2 : Line matching
Trivial, since there is only one line in this cache set
-
7/30/2019 Caches Master
45/51
Accessing Direct-Mapped
Caches III
Step 3 : Word selection
Use the block offset to grab the right word.
S W d Ab B k
-
7/30/2019 Caches Master
46/51
Some Words About Book
Conventions . First . Most direct mapped caches do not even talk
about a set index There is only 1 line per set - it doesn.t make much sense
They may do it to make the later caches easier to understand,but most systems don.t use the set index with direct mappedorganizations
Second . the last figure talked about w0 being the lowerorder byte of word w, w1 is the next byte etc. It is easy in this convention to confuse words and bytes
It should have used w0 and w1 to talk about the words in thecache and then used b00 and b10 to talk about the low orderbytes of the words 0 and 1 in the cache line
-
7/30/2019 Caches Master
47/51
Direct-Mapped Cache Simulation
-
7/30/2019 Caches Master
48/51
Recall: Cache Misses
A cache miss is when the data was not inthe cache
What can cause a block not to be in the cache
at a given time? There are three kinds of misses:
Cold (compulsory) miss
Conflict miss
Capacity miss
What kinds of cache misses did we justsee?
-
7/30/2019 Caches Master
49/51
Direct-Mapped Cache Simulation
Results
Di t M d C h
-
7/30/2019 Caches Master
50/51
Direct Mapped Cache
Observations Direct mapped caches are easy to implement
Finding the line and the set are one step, since there is only oneline per set
What are the problems?
The simple example showed that it is already experiencingmisses due to conflicts
The conflicts occurred because there is only one line
What was needed: a second line for each set index to allow bothdesired values to be placed
This leads to set associative caches It will also introduce the need for replacement strategies
Up to now, the replacement was only dictated by the address
-
7/30/2019 Caches Master
51/51
Set Associative Caches
Characterized by more than one line perset