CS250 VLSI Systems Designcs250/fa20/files/lec11... · 2020. 10. 29. · for SRAM blocks in design...
Transcript of CS250 VLSI Systems Designcs250/fa20/files/lec11... · 2020. 10. 29. · for SRAM blocks in design...
CS250, UC Berkeley Fall ‘20Lecture 10
CS250VLSISystemsDesign
Fall2020
JohnWawrzynek
with
AryaReais-Parsi
CS250, UC Berkeley Fall ‘20Lecture 11 2
CS250, UC Berkeley Fall ‘20Lecture 11
StateElements‣ OurFPGAhas“stateelements”inmultipleplaces
‣ ShifterRegisterforconfigurationloading
‣ configurationstate(PIPs,LUTs,options,…)
‣ SRAMblocks
‣ CLBflip-flops
‣ MACpipelining
‣ StandardCellLibrarywillhaveflip-flopsandlatches
‣ Wemightwantcustom/optimizedversions
‣ Orperhapsusestandardcells,butcustom“tiling”
‣ SRAMcompilerforRAMblocks(withspecialcells)
3
CS250, UC Berkeley Fall ‘20Lecture 11
Latches‣ Usuallydefinedas“level-sensitive”(asopposedtoedge-
triggered)
‣ Sometimecalled“transparent”latch
‣ Usesin:
‣ Largememoryblocks-SRAM
‣ Flip-flops
‣ shiftregisters
‣ standaloneregisters,etc.
4
CS250, UC Berkeley Fall ‘20Lecture 11
HistoricalPerspectiveforLatches‣ 80’sintothe90’s
‣ “dynamic”circuits:fast,small,low-power,unreliable!
5
nMOS
CS250, UC Berkeley Fall ‘20Lecture 11
DynamicCMOSshifter
‣ Nowrequires2-phaseclockpluscomplements
‣ Eventually,“truesinglephase”clockingschemesemerged.
‣ However,dynamicstateelementslargelyabandoned(notcountingDRAMmemory)
‣ Why? 6
CS250, UC Berkeley Fall ‘20Lecture 11
CMOSStaticLatches‣ Allcircuitsthatimplement
staticlatchesinCMOSarebasedon“cross-coupledinverters”
‣ Whilepoweredon,two-stablestates,onemeta-stablestate.
‣ Doweworryaboutmeta-stability?
7
Explains ‘robustness”
CS250, UC Berkeley Fall ‘20Lecture 11
MakingPracticalLatches
‣ Readingiseasyinthecaseofisolatedlatches(morecomplexinSRAMarrays):
‣ Outputcapacitiveloadcaneffectthewritingtime.
‣ Decouplingwithoutputbufferhelpsintiminganalysis(commoninstandardcelldesigns)
8
CS250, UC Berkeley Fall ‘20Lecture 11
WritingaLatch1‣ Overpowerthestate:
9
D
Enable
CS250, UC Berkeley Fall ‘20Lecture 11
WritingaLatch1‣ Overpowerthestate(differentialversion):
10
D
Enable
D
Enable
D
CS250, UC Berkeley Fall ‘20Lecture 11
WritingaLatch2‣ Breaktheloop(withswitches)
‣ Requiresenanden’
‣ Again,standardcellversions:
‣ willaddoutputinverterfordecoupling.Makeswritedelayindependentofload.
‣ Addinputinvertertomakewritedelayindependentofpreviousstagedrive.
11
en’
en
en
en’
CS250, UC Berkeley Fall ‘20Lecture 11
Addressingthewriteproblem
12
Implemented with a MUX.
Implemented with a tristate buffers.
CS250, UC Berkeley Fall ‘20Lecture 11
WritingaLatch3‣ Breaktheloop(withlogic)
‣ cross-coupledlogicgates
13
CS250, UC Berkeley Fall ‘20Lecture 11
Cross-coupledNORgates‣ IfbothR=0&S=0,thencross-coupledNORsequivalenttoastablelatch:
‣ IfeitherRorSbecomes=1thenstatemaychange:
‣ WhathappensifRorSorbothbecome=1?
remember,
14
CS250, UC Berkeley Fall ‘20Lecture 11
AsynchronousStateTransitionDiagram
SR Latch:
• S is “set” input • R is “reset” input
QQ’=00 is often called a “forbidden state”
Transitions triggered by input changes.
15
CS250, UC Berkeley Fall ‘20Lecture 11
Nand-gatebasedSRlatch
• Same behavior as cross-coupled NORs with inverted inputs.
16
CS250, UC Berkeley Fall ‘20Lecture 11
Level-sensitiveSRLatch
• The input “C” works as an “enable” signal, latch only changes output when C is high.
• Input NANDs invert S/R
17
CS250, UC Berkeley Fall ‘20Lecture 11
D-latch
Compare to transistor version:
18
CS250, UC Berkeley Fall ‘20Lecture 11
Flip-flops
19
Note: The terms “master” and “slave”, no longer acceptable in this new era of Diversity, Equity, and Inclusion.
CS250, UC Berkeley Fall ‘20Lecture 11
J-KFF
• Add logic to eliminate “indeterminate” action of RS FF.
• New action is “toggle” • J = “jam” • K = “kill”
J
KQ
clk
20
CS250, UC Berkeley Fall ‘20Lecture 11
StorageElementTaxonomy
synchronous asynchronous level-sensitive edge-triggered D-type é ü n.a. JK-type ü ü n.a. RS-type ü ü é “latch” “flip-flop” “latch”
é“natural” form
ü “possible” form
21
CS250, UC Berkeley Fall ‘20Lecture 11
DesignExamplewithRSFF‣ WithD-typeFFstateelements,newstateiscomputedbased
oninputs&presentstatebits-reloadedeachcycle.
‣ WithRS(orJK)FFstateelements,inputsareusedtodetermineconditionsunderwhichtosetorresetstatebits.
‣ Example:bit-serialadder(LSBfirst)
With D-FF for carry
22
CS250, UC Berkeley Fall ‘20Lecture 11
Bit-serialadderwithRSFF‣ RSFFstoresthecarry:
a b ci ci+1 sCarry kill a’b’
Carry generate ab
23
CS250, UC Berkeley Fall ‘20Lecture 11
WritingaLatch4‣ Powerdowntheinverters(toavoidthefight)
‣ Highdifferentialgain
‣ fast(reducedvoltageswingsoninput)
‣ robust(commonmodenoiserejection)
‣ Oftenusedinsensingcircuits
24
D
Enable
D
Enable
D
Positive Level-sensitive:transparent high, latching low
Negative Level-sensitive:transparent low, latching high
CS250, UC Berkeley Fall ‘20Lecture 11
Sky130D-Latch
25
CS250, UC Berkeley Fall ‘20Lecture 11
Sky130D-Latch
26
CS250, UC Berkeley Fall ‘20Lecture 11
Sky130Flip-flop
27
CS250, UC Berkeley Fall ‘20Lecture 11
Sky130Flip-flop
28
CS250, UC Berkeley Fall ‘20Lecture 11
Shifters,LatchesforFPGAs‣ HomeworkAssignment:What
shouldweuseforshiftersandlatches?
‣ Wewantsmall,fast,reliable.
‣ Minimizingthenumberoftransistorsusuallyminimizesthearea.
‣ Latchesusesin:
‣ shifters
‣ PIPs,LUTfunction
‣ flip-flops(CLBs)-needreset/set
‣ Wecanchoosefullcustomortointroducenewcellsintostdcelllibrary.
29
Configuration ShifterPIP
LUT
Flip-flopBuilt from two level-sensitive latches
CS250, UC Berkeley Fall ‘20Lecture 11
Multiplexors‣ ThenextmostimportantcircuitinFPGAs
‣ LUTimplementation,optionsinCLBs,connectionboxes
‣ Often2-to-1issufficientorbuildingblockforlargermultiplexors
‣ HomeAssignment:extractanddrawcircuitdiagramforSky1302-to-1multiplexor:
‣ DiscussalternativesforFPGAs
30
2-to-1multiplexor:C=sa+s’b
CS250, UC Berkeley Fall ‘20Lecture 11
6-TransistorSRAM(StaticRAM)‣ Largeon-chipmemoriesbuiltfromarraysofstaticRAM
bitcells,whereeachbitcellholdsabistable(cross-coupledinverters)andtwoaccesstransistors.
‣ Otherclockingandaccesslogicfactoredoutintoperiphery
31
Bit Bit
Wordline
CS250, UC Berkeley Fall ‘20Lecture 11
SRAMBlockExample
32
CS250, UC Berkeley Fall ‘20Lecture 11
6T-SRAM—Layout
33
VDD
GND
WL
BL BLB
VDD and GND: in M1 Bitlines: M2 Wordline: poly-silicon
B B
Word
CS250, UC Berkeley Fall ‘20Lecture 11
65nmSRAM‣ ST/Philips/Motorola
34
Access Transistor
Pull down Pull up
6TSRAMCellLayouts
CS250, UC Berkeley Fall ‘20Lecture 11
GeneralSRAMStructure
36
Address Decode
and Wordline
Driver
Differential Read Sense Amplifiers
Differential Write Drivers
Bitline Prechargers
Address
Write Data Read Data
Usually maximum of 128-256 bits per row
or column
Clk
ClkWrite Enable
CS250, UC Berkeley Fall ‘20Lecture 11
AddressDecoderStructure
37
A1A0 A3A2
2:4 Predecoders
Clocked Word Line Enable
Address
Word Line 0
Word Line 1
Word Line 15
One-hot 1-of-4 encoding
CS250, UC Berkeley Fall ‘20Lecture 11
ReadCycle
38
1) Precharge bitlines and senseamp
1)
2) Pulse wordlines, develop bitline differential voltage
2)
Bitline differential
Clk
Bit/Bit
Wordline
Sense
Data/Data
3) Disconnect bitlines from senseamp, activate sense pulldown, develop full-rail data signals
3) Full-rail swing
Pulses generated by internal self-timed signals, often using “replica” circuits representing critical paths
Clk
Sense
DataData
From Decoder
Wordline Clock
Prechargers
Sense Amp
Storage Cells
BitBit
Output Set-Reset Latch
CS250, UC Berkeley Fall ‘20Lecture 11
WriteCycle
39
1) Precharge bitlines
1)
Clk
Bit/Bit
Wordline
2) Pull down one bitline full rail, open wordline
2)
Clk
Write Data
From Decoder
Wordline Clock
Prechargers
Storage Cells
BitBit
Write Enable
Write-enable can be controlled on a per-bit level. If bit lines not driven
during write, cell retains value (looks like a read to the cell).
CS250, UC Berkeley Fall ‘20Lecture 11
SRAMOperation-Read
BL
WL
BL10
During read Q will get pulled up when WL first goes high, but …
• Reading the cell should not destroy the stored value
1. Bitlinesare“pre-charged”toVDD
2. Wordlineisdrivenhigh(pre-chargeristurnedoff)
3. Cellpulls-downonebitline
4. Differentialsensingcircuitonperipheryisactivatedtocapturevalueonbitlines.
40
Q
CS250, UC Berkeley Fall ‘20Lecture 11
WL
BLVDD
M 5M 6
M 4
M 1 V DDV DD V DD
BL
Q = 1Q = 0
Cbit Cbit
CMOSSRAMAnalysis(Read)
41
CS250, UC Berkeley Fall ‘20Lecture 11
SRAMOperation-Write
BL
WL
BL1-0 0-1
For successful write the access transistor needs to overpower the cell pullup
1. Columndrivercircuitonperipherydifferentiallydrivesthebitlines
2. Wordlineisdrivenhigh(columndriverstayson)
3. Onesideofcellisdrivenlow,flipstheotherside
42
Q_b
CS250, UC Berkeley Fall ‘20Lecture 11
CMOSSRAMAnalysis(Write)
BL = 1 BL = 0
Q = 0Q = 1
M 1
M 4
M 5M 6
VDD
VDD
WL
43
Size width ratio between PMOS pull-up and NMOS access
W4/L4
W6/L6
CS250, UC Berkeley Fall ‘20Lecture 11
Column-MuxingatSenseAmps
44
Sel1
Clk
Sel0
From Decoder
Wordline Clock
Sense Amp
1) Each row of the array will include more than one logical word. 2) Difficult to pitch match sense amp to tight SRAM bit cell spacing so often 2-8 columns
share one sense amp. Impacts power dissipation as multiple bitline pairs swing for each bit read.
Data Data
CS250, UC Berkeley Fall ‘20Lecture 11
BuildingLargerMemoriesLargearraysconstructedbytilingmultipleleafarrays,sharingdecodersandI/Ocircuitry
e.g.,senseampattachedtoarraysaboveandbelow
Leafarraylimitedinsizeto128-256bitsinrow/columnduetoRCdelayofwordlinesandbitlines
Alsotoreducepowerbyonlyactivatingselectedsub-bank
Inlargermemories,delayandenergydominatedbyI/Owiring
45
Bit cellsDec
I/O
Bit cells
I/O
Bit cellsDec
Bit cells
Bit cellsDec
I/O
Bit cells
I/O
Bit cellsDec
Bit cells
Bit cellsDec
I/O
Bit cells
I/O
Bit cellsDec
Bit cells
Bit cellsDec
I/O
Bit cells
I/O
Bit cellsDec
Bit cells
CS250, UC Berkeley Fall ‘20Lecture 11
AddingMorePorts
46
BitA BitA
WordlineA
WordlineB
BitB BitB
Wordline
Read Bitline
Differential Read or Write
ports
Optional Single-ended Read port
CS250, UC Berkeley Fall ‘20Lecture 11
MemoryCompilers‣ InASICflow,memorycompilersusedtogeneratelayout
forSRAMblocksindesign
‣ OftenhundredsofmemoryinstancesinamodernSoC
‣ Memorygeneratorscanalsoproducebuilt-inself-test(BIST)logic,tospeedmanufacturingtesting,andredundantrows/columnstoimproveyield
‣ Compilercanbeparameterizedbynumberofwords,numberofbitsperword,desiredaspectratio,numberofsubbanks,degreeofcolumnmuxing,etc.
‣ Area,delay,andenergyconsumptioncomplexfunctionofdesignparametersandgenerationalgorithm
‣ Worthexperimentingwithdesignspace
‣ UsuallyonlysinglereadorwriteportSRAMandonereadandonewriteSRAMgeneratorsinASIClibrary
47
CS250, UC Berkeley Fall ‘20Lecture 11
SmallMemories‣ CompiledSRAMarraysusuallyhaveahighoverheaddue
toperipheralcircuits,BIST,redundancy.
‣ Smallmemoriesareusuallybuiltfromlatchesand/orflip-flopsinastdcellflow
‣ Cross-overpointisusuallyaround1Kbitsofstorage
‣ Shouldtrydesignbothways
48
CS250, UC Berkeley Fall ‘20Lecture 04, Reconfigurable Architectures 2
EndofLecture12
49