Anne Bracy CS 3410 · The slides are the product of many rounds of teaching CS 3410 by Professors...
Transcript of Anne Bracy CS 3410 · The slides are the product of many rounds of teaching CS 3410 by Professors...
AnneBracyCS3410
ComputerScienceCornellUniversity
SeeP&HAppendixB.8(register files)andB.9
The slides are the product of many rounds of teaching CS 3410 by Professors Weatherspoon, Bala, Bracy, and Sirer.
1
PC
imm
memory
target
offset cmpcontrol
=?
newpc
memory
din dout
addr
registerfile
inst
extend
+4 +4
ASinglecycleprocessor
alu
focusfortoday
2
Memory• RegisterFiles• Tri-statedevices• SRAM(StaticRAM—randomaccessmemory)• DRAM(DynamicRAM)
3
RegisterFile• Nread/writeregisters• Indexedbyregisternumber
Dual-Read-PortSingle-Write-Port
32x32RegisterFile
QA
QB
DW
RW RA RBW
32
32
32
1 5 5 5
4
Recall:Register•Dflip-flopsinparallel•sharedclock•extraclockedinputs:write_enable,reset,…
clk
D0
D3
D1
D2
4 44-bitreg
clk 5
RegisterFile• Nread/writeregisters• Indexedbyregisternumber
addi r5, r0, 10
Howtowritetoone registerintheregisterfile?• Needadecoder
Reg 0
Reg 30Reg 31
Reg 15-to-32decoder
5RW
D32
….…00101
6
i2 i1 i0 o0 o1 o2 o3 o4 o5 o6o7
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
3-to-8decoder
3RW
…
101
7
RegisterFile• Nread/writeregisters• Indexedbyregisternumber
addi r5, r0, 10
Howtowritetoone registerintheregisterfile?• Needadecoder
Reg 0
….Reg 30Reg 31
Reg 15-to-32decoder
5RW W
D32
9
RegisterFile• Nread/writeregisters• Indexedbyregisternumber
Howtoreadfromtworegisters?• Needamultiplexor
32Reg 0Reg 1….Reg 30Reg 31
MUX
MUX
32QA
32QB
55RBRA
….
….
10
RegisterFile• Nread/writeregisters• Indexedbyregisternumber
Implementation:• Dflipflopstostorebits• Decoderforeachwriteport• Mux foreach readport
32Reg 0Reg 1….Reg 30Reg 31
MUX
MUX
32QA
32QB
55RBRA
….
….
5-to-32decoder
5RWW
D32
11
RegisterFile• Nread/writeregisters• Indexedbyregisternumber
Implementation:• Dflipflopstostorebits• Decoderforeachwriteport• Mux foreach readport
Dual-Read-PortSingle-Write-Port
32x32RegisterFile
QA
QB
DW
RW RA RBW
32
32
32
1 5 5 5
12
RegisterFiletradeoffs+ Veryfast(afewgatedelaysfor
bothreadandwrite)+ Addingextraportsis
straightforward– Doesn’tscalee.g.32Mbregisterfilewith32bitregistersNeed32x1M-to-1multiplexorand32x20-to-1MdecoderHowmanylogicgates/transistors?
a
b
c
d
e
f
g
h
s2s1s0
8-to-1mux
13
Memory• CPU:RegisterFiles(i.e.Memoryw/intheCPU)• ScalingMemory:Tri-statedevices• Cache:SRAM(StaticRAM—randomaccessmemory)• Memory:DRAM(DynamicRAM)
14
Needasharedbus (orsharedbitline)• ManyFlipFlops/outputs/etc.connectedtosinglewire• Onlyoneoutputdrives thebusatatime
• Howdowebuildsuchadevice?
S0D0
sharedline
S1D1 S2D2 S3D3 S1023D1023
15
E
E D Q0 0 z0 1 z1 0 01 1 1
D Q
Tri-StateBuffers• Ifenabled(E=1),thenQ=D• Otherwise,Qisnotconnected(z=highimpedance)
16
S0D0
sharedline
S1D1 S2D2 S3D3 S1023D1023
17
Registerfilesareveryfaststorage(onlyafewgatedelays),butdoesnotscaletolargememorysizes.
Tri-stateBuffersallowscalingsincemultipleregisterscanbeconnectedtoasingleoutput,whileonlyoneregisteractuallydrivestheoutput.
18
Memory• CPU:RegisterFiles(i.e.Memoryw/intheCPU)• ScalingMemory:Tri-statedevices• Cache:SRAM(StaticRAM—randomaccessmemory)• Memory:DRAM(DynamicRAM)
19
• StorageCells+plusTri-StateBuffers• Inputs:Address,Data(forwrites)• Outputs:Data(forreads)• AlsoneedR/Wsignal(notshown)
• Naddressbitsà 2Nwordstotal• Mdatabitsà eachwordMbits M
NAddress
Data20
• StorageCells+plusTri-StateBuffers• Decoderselectsawordline• R/Wselector determines accesstype• Wordlineisthencoupledtothedatalines
datalines
Address
Decoder
R/W
E.g.Howdowedesigna4x2MemoryModule?
(i.e.4wordlinesthatareeach2bitswide)?
2-to-4decoder
2Address
D Q D Q
D Q D Q
D Q D Q
D Q D Q
Dout[1] Dout[2]
Din[1] Din[2]
enable enable
enable enable
enable enable
enable enable
0
1
2
3WriteEnable
OutputEnable
4x2Memory
22
2-to-4decoder
2Address
Dout[1] Dout[2]
Din[1] Din[2]
enable enable
enable enable
enable enable
enable enable
0
1
2
3WriteEnable
OutputEnable
E.g.Howdowedesigna4x2MemoryModule?
(i.e.4wordlinesthatareeach2bitswide)?
2-to-4decoder
2Address
Dout[1] Dout[2]
Din[1] Din[2]
enable enable
enable enable
enable enable
enable enable
0
1
2
3WriteEnable
OutputEnable
E.g.Howdowedesigna4x2MemoryModule?
(i.e.4wordlinesthatareeach2bitswide)?
Bitlines
24
2-to-4decoder
2Address
Dout[1] Dout[2]
Din[1] Din[2]
enable enable
enable enable
enable enable
enable enable
0
1
2
3WriteEnable
OutputEnable
E.g.Howdowedesigna4x2MemoryModule?
(i.e.4wordlinesthatareeach2bitswide)?
Wordlines
25
TypicalSRAMCell
BB"
wordlinebitline
Eachcellstoresonebit,andrequires4– 8transistors(6istypical)
Pass-ThroughTransistors
26
SRAM•Afewtransistors(~6)percell•Usedforworkingmemory (caches)
•Butforevenhigherdensity…
27
Dynamic-RAM(DRAM)• Datavaluesrequireconstantrefresh
Gnd
wordlinebitline
Capacitor
Eachcellstoresonebit,andrequires1 transistors
28
Dynamic-RAM(DRAM)• Datavaluesrequireconstantrefresh
Gnd
wordlinebitline
Capacitor
Pass-ThroughTransistors
Eachcellstoresonebit,andrequires1 transistors
29
Singletransistorvs.manygates• Denser,cheaper($30/1GBvs.$30/2MB)• Butmorecomplicated,andhasanalogsensing
Alsoneedsrefresh• Readandwriteback…• …everyfewmilliseconds• Organizedin2Dgrid,socandorowsatatime• Chipcandorefreshinternally
Hence…slowerandenergyinefficient30
RegisterFiletradeoffs+ Veryfast(afewgatedelaysforbothreadandwrite)+ Addingextraportsisstraightforward– Expensive,doesn’tscale– Volatile
VolatileMemoryalternatives:SRAM,DRAM,…– Slower+ Cheaper,andscaleswell– Volatile
Non-VolatileMemory(NV-RAM):Flash,EEPROM,…+ Scaleswell– Limitedlifetime;degradesafter100000to1Mwrites
31
Finallyhavethebuildingblockstobuildmachinesthatcanperformnon-trivialcomputationaltasks
RegisterFile: TensofwordsofworkingmemorySRAM:MillionsofwordsofworkingmemoryDRAM:BillionsofwordsofworkingmemoryNVRAM:longtermstorage
(usb fob,solidstatedisks,BIOS,…)
Nexttimewewillbuildasimpleprocessor!
32