Memory hierarchies

22
Memory hierarchies An introduction to the use of cache memories as a technique for exploiting the ‘locality- of-reference’ principle

description

Memory hierarchies. An introduction to the use of cache memories as a technique for exploiting the ‘locality-of-reference’ principle. Recall system diagram. Main memory. CPU. system bus. I/O. I/O. I/O. I/O. I/O. SRAM versus DRAM. - PowerPoint PPT Presentation

Transcript of Memory hierarchies

Page 1: Memory hierarchies

Memory hierarchies

An introduction to the use of cache memories as a technique for exploiting

the ‘locality-of-reference’ principle

Page 2: Memory hierarchies

Recall system diagram

Mainmemory

I/O I/O I/O I/O I/O

system bus

CPU

Page 3: Memory hierarchies

SRAM versus DRAM

• Random Access Memory uses two distinct technologies: ‘static’ and ‘dynamic’

• Static RAM is much faster• Dynamic RAM is much cheaper• Static RAM offers ‘persistent’ storage• Dynamic RAM provides ‘volatile’ storage• Our workstations have 1GB of DRAM, but

only 512KB of SRAM (ratio: 2-million to 1)

Page 4: Memory hierarchies

Data access time

• The time-period between the supplying of the data access signal and the output (or acceptance) of the data by the addressed subsystem

• FETCH-OPERATION: cpu places address for a memory-location on the system bus, and memory-system responds by placing memory-location’s data on the system bus

Page 5: Memory hierarchies

Comparative access timings

• Processor registers: 5 nanoseconds• SRAM memory: 15 nanoseconds• DRAM memory: 60 nanoseconds• Magnetic disk: 10 milliseconds• Optical disk: (even slower)

Page 6: Memory hierarchies

System design decisions

• How much memory?• How fast?• How expensive?• Tradeoffs and compromises• Modern systems employ a ‘hybrid’ design

in which small, fast, expensive SRAM is supplemented by larger, slower, cheaper DRAM

Page 7: Memory hierarchies

Memory hierarchy

CPU

CACHE MEMORY

MAIN MEMORY

‘word’ transfers

‘burst’ transfers

Page 8: Memory hierarchies

Instructions versus Data

• Modern system designs frequently use a pair of separate cache memories, one for storing processor instructions and another for storing the program’s data

• Why is this a good idea? • Let’s examine the way a cache works

Page 9: Memory hierarchies

Fetching program-instructions

• CPU fetches its next program-instruction by placing the value from register EIP on the system bus and issuing a ‘read’ signal

• If that instruction has not previously been fetched, it will not be available in the ‘fast’ cache memory, so it must be gotten from the ‘slower’ main memory; however, it will be ‘saved’ in the cache in case it’s needed again very soon (e.g., in a program loop)

Page 10: Memory hierarchies

Most programs use ‘loops’

// EXAMPLE: int sum = 0, i = 0; do {

sum += array[ i ];++ i;}

while ( i < 64 );

Page 11: Memory hierarchies

Assembly language loop movl $0, %eax # initialize accumulator movl $0, %esi # initialize array-indexagain: addl array(%esi, 4), %eax # add next int

incl %esi # increment array-indexcmp $64, %esi # check for exit-conditionjae finis # index is beyond boundsjmp again # else do another addition

finis:

Page 12: Memory hierarchies

Benefits of cache-mechanism

• During the first pass through the loop, all of the loop-instructions must be fetched from the (slow) DRAM memory

• But loops are typically short – maybe only a few dozen bytes – and multiple bytes can be fetched together in a single ‘burst’

• Thus subsequent passes through the loop can fetch all of these same instructions from the (fast) SRAM memory

Page 13: Memory hierarchies

‘Locality-of-Reference’

• The ‘locality-of-reference’ principle is says that, with most computer programs, when the CPU fetches an instruction to execute, it is very likely that the next instruction it fetches will be located at a nearly address

Page 14: Memory hierarchies

Accessing data-operands

• Accessing a program’s data differs from accessing its instructions in that the data consists of ‘variables’ as well ‘constants’

• Instructions are constant, but data may be modified (i.e., may ‘write’ as well as ‘read’)

• This introduces the problem known as ‘cache coherency’ – if a new value gets assigned to a variable, its cached value will be modified, but what will happen to its original memory-value?

Page 15: Memory hierarchies

Cache ‘write’ policies

• ‘Write-Through’ policy: a modified variable in the cache is immediately copied back to the original memory-location, to insure that the memory and the cache are kept ‘consistent’ (synchronized)

• ‘Write-Back’ policy: a modified variable in the cache is not immediately copied back to its original memory-location – some values nearby might also get changed soon, in which case all adjacent changes could be done in one ‘burst’

Page 16: Memory hierarchies

Locality-of-reference for data

// EXAMPLE (from array-processing): int price[64], int quantity[64], revenue[64]; for (int i = 0; i < 64; i++)

revenue[ i ] = price[ i ] * quantity[ i ];

Page 17: Memory hierarchies

Cache ‘hit’ or cache ‘miss’

• When the CPU accesses an address, it first looks in the (fast) cache to see if the address’s value is perhaps already there

• In case the CPU finds that address-value is in the cache, this is called a ‘cache hit’

• Otherwise, if the CPU finds that the value it wants to access is not presently in the cache, this is called a ‘cache miss’

Page 18: Memory hierarchies

Cache-Line Replacement

• Small-size cache-memory quickly fills up • So some ‘stale’ items need to be removed

in order to make room for the ‘fresh’ items• Various policies exist for ‘replacement’• But ‘write-back’ of any data in the cache

which is ‘inconsistent’ with data in memory can safely be deferred until it’s time for the cached data to be ‘replaced’

Page 19: Memory hierarchies

Performance impact

Access frequency (proportion of cache-hits)

Average access time

x

y

y = 75 – 60x

15ns

60ns

75ns

0 1

Examples:If x = .90, then y = 21nsIf x = .95, then y = 18ns

Page 20: Memory hierarchies

Control Register 0

• Privileged software (e.g., a kernel module) can disable Pentium’s cache-mechanism (by using some inline assembly language)

asm(“ movl %cr0, %eax “); asm(“ orl $0x60000000, %eax “); asm(“ movl %eax, %cr0 “);

PG CD NW AM WP NE ET TS EM MP PE

31 30 29 18 16 5 4 3 2 1 0

Page 21: Memory hierarchies

In-class demonstration

• Install ‘nocache.c’ module to turn off cache• Execute ‘cr0.cpp’ to verify cache disabled• Run ‘sieve.cpp’ using the ‘time’ command

to measure performance w/o caching• Remove ‘nocache’ to re-enable the cache• Run ‘sieve.cpp’ again to measure speedup• Run ‘fsieve.cpp’ for comparing the speeds

of memory-access versus disk-access

Page 22: Memory hierarchies

In-class exercises

• Try modifying the ‘sieve.cpp’ source-code so that each array-element is stored at an address which is in a different cache-line (cache-line size on Pentium is 32 bytes)

• Try modifying the ‘fsieve.cpp’ source-code to omit the file’s ‘O_SYNC’ flag; also try shrinking the ‘m’ parameter so that more than one element is stored in a disk-sector (size of each disk-sector is 512 bytes)