ECE232: Hardware Organization and Design · 2019-08-29 · ECE232: Memory Hierarchy 2 Overview...

21
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Lecture 21: Memory Hierarchy

Transcript of ECE232: Hardware Organization and Design · 2019-08-29 · ECE232: Memory Hierarchy 2 Overview...

Page 1: ECE232: Hardware Organization and Design · 2019-08-29 · ECE232: Memory Hierarchy 2 Overview Ideally, computer memory would be large and fast •Unfortunately, memory implementation

Adapted from Computer Organization and Design, Patterson & Hennessy, UCB

ECE232: Hardware Organization and Design

Lecture 21: Memory Hierarchy

Page 2: ECE232: Hardware Organization and Design · 2019-08-29 · ECE232: Memory Hierarchy 2 Overview Ideally, computer memory would be large and fast •Unfortunately, memory implementation

ECE232: Memory Hierarchy 2

Overview

Ideally, computer memory would be large and fast• Unfortunately, memory implementation involves tradeoffs

Memory Hierarchy

• Includes caches, main memory, and disks

Caches

• Small and fast

• Contain subset of data from main memory

• Generally close to the processor

Terminology

• Cache blocks, hit rate, miss rate

More mathematical than material from earlier in the course

Page 3: ECE232: Hardware Organization and Design · 2019-08-29 · ECE232: Memory Hierarchy 2 Overview Ideally, computer memory would be large and fast •Unfortunately, memory implementation

ECE232: Memory Hierarchy 3

Recap: Machine Organization

Personal Computer

Processor (CPU)(active)

Computer

Control

Datapath

Memory(passive)

(where programs, & data live whenrunning)

Devices

Input

Output

Page 4: ECE232: Hardware Organization and Design · 2019-08-29 · ECE232: Memory Hierarchy 2 Overview Ideally, computer memory would be large and fast •Unfortunately, memory implementation

ECE232: Memory Hierarchy 4

Memory Basics

Users want large and fast memories

Fact

• Large memories are slow

• Fast memories are small

Large memories use DRAM technology: Dynamic Random Access Memory

• High density, low power, cheap, slow

• Dynamic: needs to be “refreshed” regularly

DRAM access times are 50-70ns at cost of $10 to $20 per GB

• FPM (Fast Page Mode)

Fast memories use SRAM: Static Random Access Memory

• Low density, high power, expensive, fast

• Static: content lasts “forever” (until lose power)

SRAM access times are .5 – 5ns at cost of $400 to $1,000 per GB

Page 5: ECE232: Hardware Organization and Design · 2019-08-29 · ECE232: Memory Hierarchy 2 Overview Ideally, computer memory would be large and fast •Unfortunately, memory implementation

ECE232: Memory Hierarchy 5

Memory Technology

SRAM and DRAM are: Random Access storage

• Access time is the same for all locations (Hardware decoder used)

For even larger and cheaper storage (than DRAM) use hard drive (Disk): Sequential Access

• Very slow, Data accessed sequentially, access time is location dependent, considered as I/O

• Disk access times are 5 to 20 million ns (i.e., msec) at cost of $.20 to $2.00 per GB

Page 6: ECE232: Hardware Organization and Design · 2019-08-29 · ECE232: Memory Hierarchy 2 Overview Ideally, computer memory would be large and fast •Unfortunately, memory implementation

ECE232: Memory Hierarchy 6

Processor-Memory Speed gap Problem

µProc

60%/yr.

(2X/1.5yr)

DRAM

5%/yr.

(2X/15 yrs)1

10

100

1000

1980

1981

1983

1984

1985

1986

1987

1988

1989

1990

1991

1992

1993

1994

1995

1996

1997

1998

1999

2000

1982

Processor-Memory

Performance Gap:

(grows 50% / year)

Perf

orm

an

ce

Time

Processor-DRAM Memory Performance GapMotivation for Memory Hierarchy

Page 7: ECE232: Hardware Organization and Design · 2019-08-29 · ECE232: Memory Hierarchy 2 Overview Ideally, computer memory would be large and fast •Unfortunately, memory implementation

ECE232: Memory Hierarchy 7

Need for speed

Assume CPU runs at 3GHz

Every instruction requires

4B of instruction and at

least one memory access

(4B of data)

• 3 * 8 = 24GB/sec

Peak performance of

sequential burst of transfer

(Performance for random

access is much much

slower due to latency)

Memory bandwidth and

access time is a

performance bottleneck

Interface Width Frequency Bytes/Sec

4-way

interleaved

PC1600

(DDR200)

SDRAM 4 x64bits 100 MHz DDR 6.4 GB/s

Opteron

Hyper-

Transport

memory

bus 128bits 200 MHz DDR 6.4 GB/s

Pentium 4 "800

MHz" FSB 64bits 200 MHz QDR 6.4 GB/s

PC2 6400

(DDR-II

800)

SDRAM 64bits 400 MHz DDR 6.4 GB/s

PC2 5300

(DDR-II

667)

SDRAM 64bits 333 MHz DDR 5.3 GB/s

Pentium 4 "533

MHz" FSB 64bits 133 MHz QDR 4.3 GB/s

FSB – Front-Side Bus; DDR – Double Data Rate; SDRAM – Synchronous DRAM

Page 8: ECE232: Hardware Organization and Design · 2019-08-29 · ECE232: Memory Hierarchy 2 Overview Ideally, computer memory would be large and fast •Unfortunately, memory implementation

ECE232: Memory Hierarchy 8

Need for Large Memory

Small memories are fast

So just write small programs

“640 K of memory should be enough for anybody”-- Bill Gates, 1981

Today’s programs require large memories

• Data base applications may require Gigabytes of memory

Page 9: ECE232: Hardware Organization and Design · 2019-08-29 · ECE232: Memory Hierarchy 2 Overview Ideally, computer memory would be large and fast •Unfortunately, memory implementation

ECE232: Memory Hierarchy 9

The Goal: Illusion of large, fast, cheap memory

How do we create a memory that is large, cheap and fast (most of the time)?

Strategy: Provide a Small, Fast Memory which holds a subset of the main memory – called cache

• Keep frequently-accessed locations in fast cache

• Cache retrieves more than one word at a time

• Sequential accesses are faster after first access

Page 10: ECE232: Hardware Organization and Design · 2019-08-29 · ECE232: Memory Hierarchy 2 Overview Ideally, computer memory would be large and fast •Unfortunately, memory implementation

ECE232: Memory Hierarchy 10

Memory Hierarchy

Hierarchy of Levels

• Uses smaller and faster memory technologies close to the processor

• Fast access time in highest level of hierarchy

• Cheap, slow memory furthest from processor

The aim of memory hierarchy design is to have access time close to the highest level and size equal to the lowest level

Page 11: ECE232: Hardware Organization and Design · 2019-08-29 · ECE232: Memory Hierarchy 2 Overview Ideally, computer memory would be large and fast •Unfortunately, memory implementation

ECE232: Memory Hierarchy 11

Memory Hierarchy Pyramid

Processor (CPU)

Size of memory at each level

Level 1

Level 2

Level n

Increasing Distance from

CPU,Decreasing cost

/ MBLevel 3

. . .

transfer datapath: bus

Decreasing

distance

from CPU,

Decreasing

Access Time

(Memory

Latency)

Page 12: ECE232: Hardware Organization and Design · 2019-08-29 · ECE232: Memory Hierarchy 2 Overview Ideally, computer memory would be large and fast •Unfortunately, memory implementation

ECE232: Memory Hierarchy 12

Basic Philosophy

Move data into ‘smaller, faster’ memory

Operate on it

Move it back to ‘larger, cheaper’ memory

• How do we keep track if changed

What if we run out of space in ‘smaller, faster’ memory?

Important Concepts: Latency, Bandwidth

Page 13: ECE232: Hardware Organization and Design · 2019-08-29 · ECE232: Memory Hierarchy 2 Overview Ideally, computer memory would be large and fast •Unfortunately, memory implementation

ECE232: Memory Hierarchy 13

Typical Hierarchy

Notice that the data width is changing

• Why?

Bandwidth: Transfer rate between various levels

• CPU-Cache: 24 GBps

• Cache-Main: 0.5-6.4GBps

• Main-Disk: 187MBps (serial ATA/1500)

CPU

regs

C

a

c

h

e

Memory disk8 B 32 B 4 KB

Cache/MM virtual memory

Page 14: ECE232: Hardware Organization and Design · 2019-08-29 · ECE232: Memory Hierarchy 2 Overview Ideally, computer memory would be large and fast •Unfortunately, memory implementation

ECE232: Memory Hierarchy 14

Why large blocks?

Fetch large blocks at a time

• Take advantage of spatial locality

for (i=0; i < length; i++)

sum += array[i];

• array has spatial locality

• sum has temporal locality

Page 15: ECE232: Hardware Organization and Design · 2019-08-29 · ECE232: Memory Hierarchy 2 Overview Ideally, computer memory would be large and fast •Unfortunately, memory implementation

ECE232: Memory Hierarchy 15

Why Hierarchy works: Natural Locality

The Principle of Locality

• Programs access a relatively small portion of the address space at any second

Memory Address0 2n - 1

Probabilityof reference

Temporal Locality (Locality in Time): Recently accessed data tend to be referenced again soon

Spatial Locality (Locality in Space): nearby items will tend to be referenced soon

1

0

Page 16: ECE232: Hardware Organization and Design · 2019-08-29 · ECE232: Memory Hierarchy 2 Overview Ideally, computer memory would be large and fast •Unfortunately, memory implementation

ECE232: Memory Hierarchy 16

Taking Advantage of Locality

Memory hierarchy

Store everything on disk

Copy recently accessed (and nearby) items from disk to smaller DRAM memory

• Main memory

Copy more recently accessed (and nearby) items from DRAM to smaller SRAM memory

• Cache memory attached to CPU

Page 17: ECE232: Hardware Organization and Design · 2019-08-29 · ECE232: Memory Hierarchy 2 Overview Ideally, computer memory would be large and fast •Unfortunately, memory implementation

ECE232: Memory Hierarchy 17

Principle of Locality

Programs access a small proportion of their address space at any time

Temporal locality

• Items accessed recently are likely to be accessed again soon

• e.g., instructions in a loop, induction variables

Spatial locality

• Items near those accessed recently are likely to be accessed soon

• E.g., sequential instruction access, array data

Page 18: ECE232: Hardware Organization and Design · 2019-08-29 · ECE232: Memory Hierarchy 2 Overview Ideally, computer memory would be large and fast •Unfortunately, memory implementation

ECE232: Memory Hierarchy 18

Memory Hierarchy: Terminology

Hit: data appears in upper level in block X

Hit Rate: the fraction of memory accesses found in the upper level

Miss: data needs to be retrieved from a block in the lower level (Block Y)

Miss Rate = 1 - (Hit Rate)

Hit Time: Time to access the upper level which consists of Time to determine hit/miss + upper level access time

Miss Penalty: Time to replace a block in the upper level + Time to deliver the block to the processor

Note: Hit Time << Miss Penalty Lower Level

Upper LevelTo Processor

From ProcessorBlock X

Block Y

Block X

Page 19: ECE232: Hardware Organization and Design · 2019-08-29 · ECE232: Memory Hierarchy 2 Overview Ideally, computer memory would be large and fast •Unfortunately, memory implementation

ECE232: Memory Hierarchy 19

Current Memory Hierarchy

Speed(ns): 1ns 2ns 6ns 100ns 10,000,000ns

Size (MB): 0.0005 0.1 1-4 1000-6000 500,000

Cost ($/MB): -- $10 $3 $0.01 $0.002

Technology: Regs SRAM SRAM DRAM Disk

Control

Data-

path

Processor

reg

s

Secondary

Memory

L2

CacheL1

Cache

Main

Memory

• Cache - Main memory: Speed

• Main memory – Disk (virtual memory): Capacity

Page 20: ECE232: Hardware Organization and Design · 2019-08-29 · ECE232: Memory Hierarchy 2 Overview Ideally, computer memory would be large and fast •Unfortunately, memory implementation

ECE232: Memory Hierarchy 20

How is the hierarchy managed?

Registers « Memory

• By the compiler (or assembly language Programmer)

Cache « Main Memory

• By hardware

Main Memory « Disks

• By combination of hardware and the operating system

• virtual memory

Control

Data-

path

Processor

reg

s

Secondary

MemoryL2

CacheL1

Cache

Main

Memory

Page 21: ECE232: Hardware Organization and Design · 2019-08-29 · ECE232: Memory Hierarchy 2 Overview Ideally, computer memory would be large and fast •Unfortunately, memory implementation

ECE232: Memory Hierarchy 21

Summary

Computer performance is generally determined by the memory hierarchy

An effective hierarchy contains multiple types of memories of increasing size

Caches (our first target) are a subset of main memory• Caches have their own special architecture

• Generally made from SRAM

• Located close to the processor

Main memory and disks

• Bulkier storage

• Main memory: Volatile (loses data when power removed)

• Disk: Non-volatile