L19-MemoryHierarchy

download L19-MemoryHierarchy

of 26

Transcript of L19-MemoryHierarchy

  • 8/3/2019 L19-MemoryHierarchy

    1/26

    L19 Memory Hierarchy 1omp 411 Spring 2008 04/8/2008

    Memory Hierarchy

    Memory FlavorsPrinciple of LocalityProgram TracesMemory HierarchiesAssociativity

    (Study Chapter 7)

  • 8/3/2019 L19-MemoryHierarchy

    2/26

    L19 Memory Hierarchy 2omp 411 Spring 2008 04/8/2008

    What Do We Want in a Memory?

    PC

    INST

    MADDR

    MDATA

    miniMIPS MEMORY

    Capacity Latency CostRegister 1000s of bits 10 ps $$$$SRAM 100s Kbytes 0.2 ns $$$DRAM 100s Mbytes 5 ns $Hard disk* 100s Gbytes 10 ms Want?

    * non-volatile

    ADDR

    DOUT

    ADDR

    DATA

    R/WWr

    1 Gbyte 0.2 ns cheap

  • 8/3/2019 L19-MemoryHierarchy

    3/26

    L19 Memory Hierarchy 3omp 411 Spring 2008 04/8/2008

    Tricks for Increasing Throughput

    RwAesDo

    Col.1 Col.2 Col.3 Col.2MRow 1Row 2

    Row 2N

    Column Multiplexer/ShifterN

    N

    MultiplexedAddress bit lines word lines

    memorycell(one bit)

    Dt1 t2 t3 t4

    The first thing that shouldpop into you mind whenasked to speed up a digitaldesignPIPELINING

    Synchronous DRAM(SDRAM)($100 per Gbyte)

    ClockDataout

    Double-clockedSynchronous DRAM(SDRAM)

  • 8/3/2019 L19-MemoryHierarchy

    4/26

    L19 Memory Hierarchy 4omp 411 Spring 2008 04/8/2008

    Hard Disk Drives

    Typical high-end drive: Average latency = 4 ms (7200 rpm) Average seek time = 8.5 ms Transfer rate = 300 Mbytes/s (SATA) Capacity = 250 G byte Cost = $85 (33 Gbyte)

    gefomwp

    dcm

  • 8/3/2019 L19-MemoryHierarchy

    5/26

    L19 Memory Hierarchy 5omp 411 Spring 2008 04/8/2008

    Quantity vs Quality

    Your memory system can be BIG and SLOW... or SMALL and FAST.

    10-8 10-3 1 100

    .110

    1000100

    1

    10-6

    DVD Burner (0.06$/G, 150ms)DISK (0.33$/GB, 10 mS)

    DRAM (100$/GB, 5 ns)SRAM (5000$/GB, 0.2 ns)

    AccessTime.01

    $/GBWeve explored a range ofdevice-design trade-offs.

    Is there anARCHITECTURAL solutionto this DELIMA?

    1

  • 8/3/2019 L19-MemoryHierarchy

    6/26

    L19 Memory Hierarchy 6omp 411 Spring 2008 04/8/2008

    Managing Memory via Programming

    In reality, systems are built with a mixture of all thesevarious memory types

    How do we make the most effective use of each memory? We could push all of these issues off to programmers

    Keep most frequently used variables and stack in SRAM Keep large data structures (arrays, lists, etc) in DRAM Keep bigger data structures on disk (databases) on DISK

    It is harder than you think data usage evolves over aprograms execution

    CPUSRAM MAINMEM

  • 8/3/2019 L19-MemoryHierarchy

    7/26L19 Memory Hierarchy 7omp 411 Spring 2008 04/8/2008

    Best of Both Worlds

    What we REALLY want: A BIG, FAST memory!(Keep everything within instant access)Wed like to have a memory system that PERFORMS like 2 GBytes of SRAM; but COSTS like 512 MBytes of slow memory.SURPRISE: We can (nearly) get our wish!KEY: Use a hierarchy of memory technologies:

    CPUSRAM MAINMEM

  • 8/3/2019 L19-MemoryHierarchy

    8/26L19 Memory Hierarchy 8omp 411 Spring 2008 04/8/2008

    Key IDEA

    Keep the most often-used data in a small,fast SRAM (often local to CPU chip) Refer to Main Memory only rarely, forremaining data.The reason this strategy works: LOCALITY

    Locality of Reference:Reference to location X at time t implies

    that reference to location X+X at timet+t becomes more probable as X andt approach zero.

  • 8/3/2019 L19-MemoryHierarchy

    9/26L19 Memory Hierarchy 9omp 411 Spring 2008 04/8/2008

    Cache

    cache (kash)n.A hiding place used especially for storing provisions.A place for concealment and safekeeping, as of valuables.The store of goods or valuables concealed in a hiding place.Computer Science. A fast storage buffer in the central processingunit of a computer. In this sense, also called cache memory.

    v. tr. cached, cach ing, caches.To hide or store in a cache.

  • 8/3/2019 L19-MemoryHierarchy

    10/26L19 Memory Hierarchy 10omp 411 Spring 2008 04/8/2008

    Cache Analogy

    You are writing a term paper at a table in thelibrary

    As you work you realize you need a bookYou stop writing, fetch the reference, continue

    writingYou dont immediately return the book, maybe youllneed it againSoon you have a few books at your table and nolonger have to fetch more booksThe table is a CACHE for the rest of the library

  • 8/3/2019 L19-MemoryHierarchy

    11/26L19 Memory Hierarchy 11omp 411 Spring 2008 04/8/2008

    Typical Memory Reference Patterns

    time

    address

    data

    stack

    program

    MEMORY TRACE A temporal sequenceof memory references(addresses) from areal program.TEMPORAL LOCALITY If an item is referenced,it will tend to bereferenced again soonSPATIAL LOCALITY If an item is referenced,nearby items will tendto be referenced soon.

  • 8/3/2019 L19-MemoryHierarchy

    12/26L19 Memory Hierarchy 12omp 411 Spring 2008 04/8/2008

    Working Set

    time

    address

    data

    stack

    program

    t

    |S|

    t

    S is the set of locationsaccessed during t.Working set: a set S

    which changes slowlyw.r.t. access time.Working set size, |S|

  • 8/3/2019 L19-MemoryHierarchy

    13/26L19 Memory Hierarchy 13omp 411 Spring 2008 04/8/2008

    Exploiting the Memory Hierarchy

    Approach 1 (Cray, others): Expose Hierarchy Registers, Main Memory,

    Disk each available asstorage alternatives; Tell programmers: Use them cleverly

    Approach 2: Hide Hierarchy Programming model: SINGLE kind of memory, single addressspace. Machine AUTOMATICALLY assigns locations to fast or slowmemory, depending on usage patterns.

    CPUSRAM MAINMEM

    CPU SmallStatic DynamicRAMHARDDISK

    MAIN MEMORY

  • 8/3/2019 L19-MemoryHierarchy

    14/26L19 Memory Hierarchy 14omp 411 Spring 2008 04/8/2008

    Why We Care

    CPU SmallStatic DynamicRAMHARDDISK

    MAIN MEMORY

    TRICK #1: How to make slow MAIN MEMORY appear faster than it is.

    CPU performance is dominated by memory performance.More significant than:ISA, circuit optimization, pipelining, etc

    TRICK #2: How to make a small MAIN MEMORY appear bigger than it is.

    VIRTUAL MEMORYSWAP SPACE

    Technique: VIRTUAL MEMORY Lecture after that

    CACHE

    Technique: CACHEING This and next lecture

  • 8/3/2019 L19-MemoryHierarchy

    15/26L19 Memory Hierarchy 15omp 411 Spring 2008 04/8/2008

    The Cache Idea:Program-Transparent Memory Hierarchy

    Cache contains TEMPORARY COPIES of selectedmain memory locations... eg. Mem[100] = 37GOALS:1) Improve the average access time

    2) Transparency (compatibility, programming ease)

    1.0 (1.0-)CPU"CACHE"

    DYNAMICRAM"MAINEMORY"

    100 37

    (1-) HIT RATIO: Fraction of refs found in CACHE.MISS RATIO: Remaining references.Challenge:To make thehit ratio ashigh aspossible.mcmccave t)(t)tt)((tt 11

  • 8/3/2019 L19-MemoryHierarchy

    16/26L19 Memory Hierarchy 16omp 411 Spring 2008 04/8/2008

    How High of a Hit Ratio?

    Suppose we can easily build an on-chip static memorywith a 0.8 nS access time, but the fastest dynamicmemories that we can buy for main memory have anaverage access time of 10 nS. How high of a hit rate dowe need to sustain an average access time of 1 nS?

    m

    cave

    t

    tt 1 98

    10

    8.011

    WOW, a cache really needs to be good?

  • 8/3/2019 L19-MemoryHierarchy

    17/26L19 Memory Hierarchy 17omp 411 Spring 2008 04/8/2008

    CacheSits between CPU and main memoryVery fast table that stores a TAG and DATATAG is the memory address

    DATA is a copy of memory at the address given by TAG

    1000 171040 11032 971008 11

    1000 171004 231008 111012 51016 291020 381024 441028 991032 971036 251040 11044 4

    Memory

    Tag Data

  • 8/3/2019 L19-MemoryHierarchy

    18/26L19 Memory Hierarchy 18omp 411 Spring 2008 04/8/2008

    Cache Access

    On load we look in the TAG entries for the address were loading

    Found a HIT, return the DATA

    Not Found a MISS, go to memory for the data and put it andthe address (TAG) in the cache

    1000 171004 231008 111012 51016 291020 381024 441028 991032 971036 251040 11044 4

    Memory

    1000 171040 11032 971008 11

    Tag Data

  • 8/3/2019 L19-MemoryHierarchy

    19/26L19 Memory Hierarchy 19omp 411 Spring 2008 04/8/2008

    Cache Lines

    Usually get more data than requested (Why?)

    a LINEis the unit of memory stored in the cache

    usually much bigger than 1 word, 32 bytes per line is common

    bigger LINE means fewer misses because of spatial locality

    but bigger LINE means longer time on miss

    1000 17 231040 1 41032 97 251008 11 5

    Tag Data

    1000 171004 231008 111012 51016 291020 381024 441028 991032 971036 251040 11044 4

    Memory

  • 8/3/2019 L19-MemoryHierarchy

    20/26L19 Memory Hierarchy 20omp 411 Spring 2008 04/8/2008

    Finding the TAG in the Cache

    A 1MByte cache may have 32k different lines each of 32 bytes

    We cant afford to sequentially search the 32k different tags

    ASSOCIATIVEmemory uses hardware to compare the address tothe tags in parallel but it is expensive and 1MByte is thus unlikely

    TAG Data= ?

    TAG Data= ?

    TAG Data= ?

    IncomingAddress

    HIT

    DataOut

  • 8/3/2019 L19-MemoryHierarchy

    21/26L19 Memory Hierarchy 21omp 411 Spring 2008 04/8/2008

    Finding the TAG in the Cache

    A 1MByte cache may have 32k different lines each of 32 bytes

    We cant afford to sequentially search the 32k different tags

    ASSOCIATIVEmemory uses hardware to compare the address tothe tags in parallel but it is expensive and 1MByte is thus unlikely

    DIRECT MAPPED CACHEcomputes the cache entry from theaddress

    multiple addresses map to the same cache line

    use TAG to determine if right

    Choose some bits from the address to determine the Cache line

    low 5 bits determine which byte within the line

    we need 15 bits to determine which of the 32k different lineshas the data

    which of the 32 5 = 27 remaining bits should we use?

  • 8/3/2019 L19-MemoryHierarchy

    22/26L19 Memory Hierarchy 22omp 411 Spring 2008 04/8/2008

    Direct-Mapping Example

    1024 44 991000 17 231040 1 41016 29 38

    Tag Data

    1000 171004 231008 111012 51016 291020 381024 441028 991032 971036 251040 11044 4

    Memory

    With 8 byte lines, the bottom 3 bits determine the byte within the line

    With 4 cache lines, the next 2 bits determine which line to use

    1024d = 10000000000b line = 00b = 0d

    1000d = 01111101000b line = 01b = 1d

    1040d = 10000010000b line = 10b = 2d

  • 8/3/2019 L19-MemoryHierarchy

    23/26L19 Memory Hierarchy 23omp 411 Spring 2008 04/8/2008

    Direct Mapping Miss

    1024 44 991000 17 231040 1 41016 29 38

    Tag Data

    1000 171004 231008 111012 51016 291020 381024 441028 991032 971036 251040 11044 4

    Memory

    What happens when we now ask for address 1008?

    1008d = 01111110000b line = 10b = 2d

    but earlier we put 1040d there...

    1040d = 10000010000b line = 10b = 2d

    1008 11 5

  • 8/3/2019 L19-MemoryHierarchy

    24/26L19 Memory Hierarchy 24omp 411 Spring 2008 04/8/2008

    Miss Penalty and Rate

    The MISS PENALTYis the time it takes to read the memory if it isnt

    in the cache50 to 100 cycles is common.

    The MISS RATE is the fraction of accesses which MISS

    The HIT RATE is the fraction of accesses which HIT

    MISS RATE + HIT RATE = 1

    Suppose a particular cache has a MISS PENALTY of 100 cyclesand a HIT RATE of 95%. The CPI for load on HIT is 5 but on aMISS it is 105. What is the average CPI for load?

    Average CPI = 10Suppose MISS PENALTY = 120 cycles?

    then CPI = 11 (slower memory doesnt hurt much)

    5 * 0.95 + 105 * 0.05 = 10

  • 8/3/2019 L19-MemoryHierarchy

    25/26

    L19 Memory Hierarchy 25omp 411 Spring 2008 04/8/2008

    What about store?

    What happens in the cache on a store?WRITE BACK CACHE put it in the cache, write on replacementWRITE THROUGH CACHE put in cache and in memory

    What happens on store and a MISS?WRITE BACK will fetch the line into cacheWRITE THROUGH might just put it in memory

  • 8/3/2019 L19-MemoryHierarchy

    26/26

    Cache Questions = Cash Questions

    What lies between Fully Associate and Direct-Mapped?When I put something new into the cache, what data getsthrown out?How many processor words should there be per tag?When I write to cache, should I also write to memory?What do I do when a write misses cache, should space incache be allocated for the written address.What if I have INPUT/OUTPUT devices located at certainmemory addresses, do we cache them?