B. Ramamurthy. 12 stage pipeline At peak speed, the processor can request both an instruction and...

6
B. Ramamurthy

Transcript of B. Ramamurthy. 12 stage pipeline At peak speed, the processor can request both an instruction and...

B. Ramamurthy

12 stage pipeline At peak speed, the processor can request

both an instruction and a data word on every clock.

We cannot afford pipeline stalls: solution: add a cache

Cache is 16KB, 16-word blocks

Send address to the appropriate cache. The address comes from either the PC or from the ALU.

If the cache signals hit, the requested word is available on the data lines

Since there are 16 words in the desired block, we need to select the right word.

Block index field is used to select the indexed word from the 16 words in the indexed block.

If cache signals miss, we send the address to main memory and get the data from main memory and fill the cache. Data is then read again.

Lets look at the schematic of the organization: fig.7.9

CPU Cache

Main Memory

What is the bus width?How to organize the main memory?

Assume that on a cache miss, We need 1 memory cycle to send address to main memory 15 memory cycles to read DRAM memory word (assume

bus width is 32 bits = 4 bytes) 1 memory cycle to send word of data back Total for block access: 1+ 4X15 + 1X4 = 1 + 60 + 4 = 65 cycle Bytes received = 1 block of cache = 4 X 4 = 16 bytes Byte/cycle = 16/65 = 0.25 ( too low for our fast

processor!) What is your solution? Need better bandwidth. Increase bus width? Memory interleave? Wide memory

organization? See fig. 7-11

Increase memory width: double it 1 + 2 X 15 + 2 X1 = 1+ 30 + 2 = 33 cycles 16/33 = 0.5 Memory interleaving: 1 + 15 + 4x1 = 20 cycles 16/20 = 4/5 = 0.8 65 cycles penalty 33 cycles 20 cycles

(not bad at all)