Caches Master

download Caches Master

of 51

Transcript of Caches Master

  • 7/30/2019 Caches Master

    1/51

    An example memory hierarchy

    registers

    on-chip L1cache (SRAM)

    main memory(DRAM)

    local secondary storage(local disks)

    Larger,slower,

    andcheaper

    (per byte)storagedevices

    remote secondary storage(distributed file systems, Web servers)

    Local disks hold filesretrieved from disks onremote network servers.

    Main memory holds diskblocks retrieved from localdisks.

    off-chip L2cache (SRAM)

    L1 cache holds cache lines retrievedfrom the L2 cache.

    CPU registers hold words retrievedfrom cache memory.

    L2 cache holds cache linesretrieved from memory.

    L0:

    L1:

    L2:

    L3:

    L4:

    L5:

    Smaller,faster,and

    costlier(per byte)storagedevices

  • 7/30/2019 Caches Master

    2/51

    Principle of Locality

    Programs tend to re-use data and instructions near

    those they have used recently

    Two types of locality1. Temporal locality: items currently being used are

    likely to be reused in the near future

    2. Spatial locality: items with nearby addresses tend

    to be referenced close together in time

    Principle of Locality

  • 7/30/2019 Caches Master

    3/51

    Are there things we can do in our code to

    allow us to benefit from the principle of

    locality?

    Programs that reference the same variables

    repeatedly have good locality

    Programs with small strides in array

    locations have good locality Programs with small loops with many

    iterations have good locality

    Programming to Maximize Benefits of

    Locality

  • 7/30/2019 Caches Master

    4/51

    Locality Exampleint sumvec(int v[N]) {

    int i, sum = 0;

    for (i = 0; i < n; i++)

    sum += v[i];

    return sum;

    }

    Locality: Data

    Reference array elements in succession

    (stride-1 reference pattern): Spatial locality

    Reference sum each iteration: Temporal locality

    InstructionsReference instructions in sequence: Spatial locality

    Cycle through loop repeatedly: Temporal locality

  • 7/30/2019 Caches Master

    5/51

    Locality Example

    How is the locality of this program?

    Note the elements are accessed in the orderthey are stored in memory This implies good locality

  • 7/30/2019 Caches Master

    6/51

    Locality Example 2

    Claim: Being able to look at code and get a qualitative sense of its

    locality is a key skill for a professional programmer.

    Question: Does this function have good locality?

  • 7/30/2019 Caches Master

    7/51

    Locality Example 2

    How is the locality of this program?

    Recall arrays in C/C++ are stored row-wise This means that all of row 1 is stored contiguously, then

    row 2, etc.

    This leads to good locality

  • 7/30/2019 Caches Master

    8/51

    Locality Example 3

    .Question: Does this function have good locality?

    Assume a 2x3 array for ease

    int sum_array_cols(int a[M][N])

    { int i, j, sum = 0;

    for (j = 0; j < N; j++)

    for (i = 0; i < M; i++)

    sum += a[i][j];

    return sum;}

  • 7/30/2019 Caches Master

    9/51

    Locality Example 3

    How is the locality of this program ?

    Recall arrays in C/C++ are stored row-wise But we are accessing the array by columns

    This leads to very poor locality

  • 7/30/2019 Caches Master

    10/51

    Locality Example 4 Question: Can you permute the loops so that the function scans the

    3-d array a[ ] with a stride-1 reference pattern (and thus has good

    spatial locality)?

    int sumarray3d(int a [N] [N] [N]){

    int i, j, k, sum = 0;

    for (i = 0; i < N; i++)

    for (j = 0; j < N; j++)

    for (k = 0; k < N; k++)

    sum += a[k][i][j];return sum;

    }

  • 7/30/2019 Caches Master

    11/51

    Locality Example 4 Recall: the best indexing for a 2-D array is to start at a row and

    count across all the columns in a[i][j]

    i.e. increment the j counter in the inner loop

    So: for multi-dimensional loops, simply run the loops

    in the order of the indices

    int sumarray3d(int a [N] [N] [N]){

    int i, j, k, sum = 0;

    for (k = 0; k < N; k++)

    for (i = 0; i < N; i++)

    for (j = 0; j < N; j++)

    sum += a[k][i][j];return sum;

    }

  • 7/30/2019 Caches Master

    12/51

    An example memory hierarchy

    registers

    on-chip L1cache (SRAM)

    main memory(DRAM)

    local secondary storage(local disks)

    Larger,slower,

    andcheaper

    (per byte)storagedevices

    remote secondary storage(distributed file systems, Web servers)

    Local disks hold filesretrieved from disks onremote network servers.

    Main memory holds diskblocks retrieved from localdisks.

    off-chip L2cache (SRAM)

    L1 cache holds cache lines retrievedfrom the L2 cache.

    CPU registers hold words retrievedfrom cache memory.

    L2 cache holds cache linesretrieved from memory.

    L0:

    L1:

    L2:

    L3:

    L4:

    L5:

    Smaller,faster,and

    costlier(per byte)storagedevices

  • 7/30/2019 Caches Master

    13/51

    Definition of a Cache

    The word cache is used throughout the memory

    hierarchy diagram

    What is a Cache?

    A smaller, faster storage device that acts as astaging area for a subset of the data in a larger,

    slower device.

    Why would a cache help us at all?

    It relies on the Principle of Locality

  • 7/30/2019 Caches Master

    14/51

    Caching in the Memory Hierarchy

    0 1 2 3

    4 5 6 7

    8 9 10 11

    12 13 14 15

    9 14 3Level k:

    Level k+1:

    Larger, slower, cheaper storagedevice at level k+1 is partitioned

    into blocks.

    Data is copied between

    levels in block-sized transfer units

    Smaller, faster, more expensive

    memory caches a subset of

    the blocks

    4

    4

    10

    14

    It works because programs tend to access data at level k more often

    than at k+1.

  • 7/30/2019 Caches Master

    15/51

    Caching in the Memory

    Hierarchy II

    Goal:

    The memory hierarchy provides a large pool

    of memory that: Costs as much as the cheap storage near the

    bottom (well, only a little bit more),

    butserves data to programs at the rate of the faststorage near the top.

  • 7/30/2019 Caches Master

    16/51

    General caching concepts I

    Program needsobject d, which is

    stored in block 14.

    Block 14 is locatedin the level k cache

    This is called a

    Cache hit

    0 1 2 3

    4 5 6 7

    8 9 10 11

    12 13 14 15

    9 14 3Level k:

    Level k+1:

    4

    4

    10

    14

  • 7/30/2019 Caches Master

    17/51

    General caching concepts II

    Program needs objectb, which is stored inblock 12.

    b is not at level k, solevel k cache must

    fetch it from level k+1. This is called a Cache

    miss

    Simplest case is if

    there is a spot for block12 in level k

    What happens if level kis full?

    0 1 2 3

    4 5 6 7

    8 9 10 11

    12 13 14 15

    9 14 3Level k:

    Level k+1:

    4

    4

    10

    14

  • 7/30/2019 Caches Master

    18/51

    General caching concepts III

    Program needs object b, whichis stored in block 12.

    b is not at level k, so level k

    cache must fetch it from level

    k+1.

    level k cache is full must be replaced (evicted)

    Which one is the victim?

    Placement policy:

    where can the new block

    go? E.g., b mod 4

    Replacement policy:

    which block should be

    evicted? E.g., LRU

  • 7/30/2019 Caches Master

    19/51

    General Caching Concepts IV

    Block A block is the smallest amount of memory that is

    stored in a cache

    The size of the block can be different for each stage

    in the memory hierarchy

    We rely on what kind of locality to ensure theadditional data in the block is needed?

    Transfer Unit

    These are the amount of data that we transfer intoand out of a cache when we need new data

    A transfer unit is the same size as the block for thatstage

  • 7/30/2019 Caches Master

    20/51

    General Caching Concepts V

    Cache hit

    Program finds b in the cache at level k.

    Cache miss

    b is not at level k, so level k cache must fetch it fromlevel k+1. E.g., next lower level in the hierarchy

    Placement policy: where can the new block go?

    If level k cache is full, then some current block must be

    replaced Replacement policy: which block should be evicted?

  • 7/30/2019 Caches Master

    21/51

    General Caching Concepts .

    Cache Misses

    Recall a cache miss is when the data wasnot in the cache

    What can cause a block not to be in the cache

    at a given time? There are three kinds of misses:

    1. Cold (compulsory) miss

    2. Conflict miss

    3. Capacity miss

  • 7/30/2019 Caches Master

    22/51

    General Caching Concepts -

    Cache Misses II

    Cold (compulsory) miss

    Cold misses occur because the cache is empty

    This happens when the machine starts-up

    Capacity miss

    Occurs when the set of active cache blocks (working

    set) is larger than the cache

    There is no longer enough room in the cache for all the dataso some block needs to be discarded.

  • 7/30/2019 Caches Master

    23/51

    General Caching Concepts .

    Cache Misses III

    Conflict miss Most caches limit blocks at level k+1 to a small subset

    (sometimes a singleton) of positions at level k.

    E.g. Block i at level k+1 must be placed in block (i mod 4) atlevel k+1.

    Conflict misses occur when multiple data objectsall map to the same level k block. E.g. Referencing blocks 0, 8, 0, 8, 0, 8, ... would miss

    every time in a cache with 4 blocks where we store atlocation i mod 4.

  • 7/30/2019 Caches Master

    24/51

    General Org of a Cache Memory

    The cache contains sets of memory words called Blocks These blocks usually contain a few words of memory

    Each block is part of a Line in the cache

    The line contains identifying information on what is actually in theblocks and a status bit to say whether the data in the block is stillvalid

    These lines are then grouped into Sets There may be 1 or more lines of cache data per set

    The exact number of lines per set is determined by the type ofcache (more on that in a few slides)

  • 7/30/2019 Caches Master

    25/51

    General Org of a Cache Memory

  • 7/30/2019 Caches Master

    26/51

    Four Key Questions for

    Designing a Cache1. Block Identification . How is a block found if it

    is in the upper level ?

    2. Block Placement - Where can a block be placed

    in the upper level?

    3. Block Replacement . Which block should bereplaced when there is a cache miss?

    4. Write Strategy . What happens on a memorywrite?

  • 7/30/2019 Caches Master

    27/51

    Q1: How is a block found if it is in the

    cache? Blocks are maintained based on their block address

    (assume it is a physical address in memory)

    The address is divided into two parts:

    The block address and The block offset (the LSBs that correspond to the

    location of the desired data within the Block

    The block address is further divided into two:

    1. set index tells which Set

    2. tag provides the remaining block address bits

  • 7/30/2019 Caches Master

    28/51

    Decoding the Address for a

    Cached Word The following stages are needed to find a word in a cache

    Different parts of the actual word address provide these values

    1. Read the set IDto see which set the word may be in

    Called set select ion2. Read the tagto see if the word we want is actually in

    this line of the cache

    The tag is a type of hash that provides the word address (at

    least part of it)

    Called l ine matchingis line of the cache

  • 7/30/2019 Caches Master

    29/51

    Decoding the Address for a

    Cached Word II3. Read the val id bit

    - Only want to read the desired word if the cache line is valid

    4. Read the block of fsetvalue to know which wordin the line isthe word we want

    - Called wo rd Extraction

  • 7/30/2019 Caches Master

    30/51

    Decoding the Address for a

    Cached Word

  • 7/30/2019 Caches Master

    31/51

    Order of the Bits Used in the

    Address

    Remember that the bit fields in the actual wordaddress are used to access the cache A simple hash function uses these fields to access

    the lines and individual words in the cache

    Generally: a hash (hash value) is a non-uniqueaddress used to quickly access a table It is not unique because hash tables can only store

    small subsets of a larger range of data values

    The pigeon hole principle is in effect

    The order of the bit field selection is critical toeffective performance

  • 7/30/2019 Caches Master

    32/51

    Order of the Bits Used in the

    Address II First, the middle bits are used for set selection

    Why does this make sense?

    Its undesirable for nearby words in memory to map to the same line

    in the cache

    It will cause the cache to replace possibly useful data too often

  • 7/30/2019 Caches Master

    33/51

    Order of the Bits Used in the

    Address III

    It would go against our knowledge of spatial

    Locality

    Here we want consecutive words in different lines ofthe cache

    This way, when the program accesses the next

    data word in memory, the cache wont need to

    discard the last one. Instead, the next cache

    line is accessed, so the cache keeps them both.

  • 7/30/2019 Caches Master

    34/51

    Why Use the Middle Bits as an

    Index?

    High-Order Bit Indexing

    Adjacent memory lines would mapto same cache

    Entry Poor use of spatial locality

    Middle-Order Bit Indexing

    Consecutive memory lines map todifferent cache lines

    (Least significant bits are not

    shown here)

  • 7/30/2019 Caches Master

    35/51

    Order of the Bits Used in the

    Address IV Second, the top (most significant) bits are used for line

    matching (tag matching) Why does this make sense?

    If the process got this far, then the middle order bitsalready match Again, this is using knowledge of spatial locality

  • 7/30/2019 Caches Master

    36/51

    Order of the Bits Used in the

    Address V If the set index bits match, then the higher order bits need to

    be checked

    These are only the same if the address is from the same

    higher order bit region of memory

    With locality and the limited cache size, we know that thetag will indicate whether we are dealing with the same

    words (the same block)

  • 7/30/2019 Caches Master

    37/51

    Order of the Bits Used in the

    Address VI

    Third (finally), the low order bits are used for

    word extraction

    Why does this make sense?

    These are the most unique identifiers for a word in memory Another word with the same low order bits will be located far

    away in memory

    Therefore, according to the principle of locality, it is unlikely

    that another word in memory with the same LSBs will be

    needed near the same time

  • 7/30/2019 Caches Master

    38/51

    Four Key Questions for

    Designing a Cache1. Block Identification . How is a block found if it

    is in the upper level ?

    2. 2. Block Placement - Where can a block be placed

    in the upper level?

    3. Block Replacement . Which block should bereplaced when there is a cache miss?

    4. Write Strategy . What happens on a memorywrite?

  • 7/30/2019 Caches Master

    39/51

    Q2: Where Can a Block be

    Placed in a Cache?

    . There are three approaches to blockplacement: 1. Direct Mapped

    The block can only be placed in a specific location in the

    cache 2. Set-Assoc iative

    A set is a group of blocks in the cache

    The incoming block can only be placed in a specific set ofblocks

    The block can be placed anywhere within this set

    3. Ful ly Associat ive The block can be placed anywhere in the cache

  • 7/30/2019 Caches Master

    40/51

    Mapping Memory Blocks into the

    Cache

    This is really a continuum of associativeproperties:

    Direct mapped can be considered 1-way set

    associative (the block has only 1 alloweddestination)

    For N-way set associative, the block has Nplaces it can be placed

    Fully associative can be considered m-wayset associative (where m is the number ofblocks), since the block can go anywhere

  • 7/30/2019 Caches Master

    41/51

    Mapping Memory Blocks into the

    Cache II

  • 7/30/2019 Caches Master

    42/51

    Direct-Mapped Cache

    Simplest kind of cache

    Characterized by exactly one line per set.

    Therefore, the tag and set index are

    somewhat redundant The tag is insurance that the wrong word is not

    used

  • 7/30/2019 Caches Master

    43/51

    Accessing Direct-Mapped

    Caches I

    . Step 1: Set selection

    identical to direct-mapped cache

  • 7/30/2019 Caches Master

    44/51

    Accessing Direct-Mapped

    Caches II

    Step 2 : Line matching

    Trivial, since there is only one line in this cache set

  • 7/30/2019 Caches Master

    45/51

    Accessing Direct-Mapped

    Caches III

    Step 3 : Word selection

    Use the block offset to grab the right word.

    S W d Ab B k

  • 7/30/2019 Caches Master

    46/51

    Some Words About Book

    Conventions . First . Most direct mapped caches do not even talk

    about a set index There is only 1 line per set - it doesn.t make much sense

    They may do it to make the later caches easier to understand,but most systems don.t use the set index with direct mappedorganizations

    Second . the last figure talked about w0 being the lowerorder byte of word w, w1 is the next byte etc. It is easy in this convention to confuse words and bytes

    It should have used w0 and w1 to talk about the words in thecache and then used b00 and b10 to talk about the low orderbytes of the words 0 and 1 in the cache line

  • 7/30/2019 Caches Master

    47/51

    Direct-Mapped Cache Simulation

  • 7/30/2019 Caches Master

    48/51

    Recall: Cache Misses

    A cache miss is when the data was not inthe cache

    What can cause a block not to be in the cache

    at a given time? There are three kinds of misses:

    Cold (compulsory) miss

    Conflict miss

    Capacity miss

    What kinds of cache misses did we justsee?

  • 7/30/2019 Caches Master

    49/51

    Direct-Mapped Cache Simulation

    Results

    Di t M d C h

  • 7/30/2019 Caches Master

    50/51

    Direct Mapped Cache

    Observations Direct mapped caches are easy to implement

    Finding the line and the set are one step, since there is only oneline per set

    What are the problems?

    The simple example showed that it is already experiencingmisses due to conflicts

    The conflicts occurred because there is only one line

    What was needed: a second line for each set index to allow bothdesired values to be placed

    This leads to set associative caches It will also introduce the need for replacement strategies

    Up to now, the replacement was only dictated by the address

  • 7/30/2019 Caches Master

    51/51

    Set Associative Caches

    Characterized by more than one line perset