1 Clockless Computing Montek Singh Thu, Sep 6, 2007 Review: Logic Gate Families A classic...
-
Upload
jodie-willis -
Category
Documents
-
view
221 -
download
0
Transcript of 1 Clockless Computing Montek Singh Thu, Sep 6, 2007 Review: Logic Gate Families A classic...
1
Clockless ComputingClockless Computing
Montek SinghMontek SinghThu, Sep 6, 2007Thu, Sep 6, 2007
Review: Logic Gate FamiliesReview: Logic Gate Families A classic asynchronous pipeline by WilliamsA classic asynchronous pipeline by Williams
2
Review:Review:Logic Gate FamiliesLogic Gate Families
Static CMOS logic (“standard”)Static CMOS logic (“standard”) Transmission gates, or “pass-transistor” logicTransmission gates, or “pass-transistor” logic Dynamic logic, or “domino” logicDynamic logic, or “domino” logic
3
Static CMOS logic: SummaryStatic CMOS logic: SummaryAdvantages:Advantages:
output always strongly drivenoutput always strongly drivenpull-up and pull-down networks are fully-complementary;pull-up and pull-down networks are fully-complementary;
always exactly one of them is “on”always exactly one of them is “on”good immunity from noise and leakagegood immunity from noise and leakage
both inverting and non-inverting functions both inverting and non-inverting functions implementableimplementableeach gate is invertingeach gate is invertingcascade two gates together to get non-inverting logiccascade two gates together to get non-inverting logic
Disadvantages:Disadvantages: slow/big PMOS devices needed (in addition to NMOS)slow/big PMOS devices needed (in addition to NMOS)
greater chip areagreater chip areahigher power consumptionhigher power consumptionslower switching speedslower switching speed
OPTIONALMATERIAL
4Credit: David Harris, Harvey Mudd College
Complementary CMOS Complementary CMOS logic gates
– nMOS pull-down network– pMOS pull-up network– a.k.a. static CMOS
pMOSpull-upnetwork
outputinputs
nMOSpull-downnetwork
Pull-up OFF Pull-up ON
Pull-down OFF Z (float) 1
Pull-down ON 0 X (crowbar)
OPTIONALMATERIAL
5Credit: David Harris, Harvey Mudd College
Series and Parallel nMOS: 1 = ON pMOS: 0 = ON Series: both must be ON Parallel: either can be ON
(a)
a
b
a
b
g1
g2
0
0
a
b
0
1
a
b
1
0
a
b
1
1
OFF OFF OFF ON
(b)
a
b
a
b
g1
g2
0
0
a
b
0
1
a
b
1
0
a
b
1
1
ON OFF OFF OFF
(c)
a
b
a
b
g1 g2 0 0
OFF ON ON ON
(d) ON ON ON OFF
a
b
0
a
b
1
a
b
11 0 1
a
b
0 0
a
b
0
a
b
1
a
b
11 0 1
a
b
g1 g2
OPTIONALMATERIAL
6Credit: David Harris, Harvey Mudd College
CMOS Gate Design Activity:
– Sketch a 4-input CMOS NOR gate
A
B
C
DY
OPTIONALMATERIAL
7Credit: David Harris, Harvey Mudd College
CMOS Gate Design Activity:
– Sketch a 4-input CMOS NAND gate
OPTIONALMATERIAL
8Credit: David Harris, Harvey Mudd College
Conduction Complement Complementary CMOS gates always produce 0 or 1 Ex: NAND gate
– Series nMOS: Y=0 when both inputs are 1– Thus Y=1 when either input is 0– Requires parallel pMOS
Rule of Conduction Complements– Pull-up network is complement of pull-down– Parallel -> series, series -> parallel
A
B
Y
OPTIONALMATERIAL
9Credit: David Harris, Harvey Mudd College
Compound Gates Compound gates can do any inverting function Ex: (AND-AND-OR-INVERT, AOI22)Y A B C D
A
B
C
D
A
B
C
D
A B C DA B
C D
B
D
YA
CA
C
A
B
C
D
B
D
Y
(a)
(c)
(e)
(b)
(d)
(f)
10
Transmission (“Pass”) GatesTransmission (“Pass”) GatesKey Idea:Key Idea:
transistors used in a different configurationtransistors used in a different configuration when switched on: instead of connecting output to when switched on: instead of connecting output to
Vdd or Gnd, they connect output to the inputVdd or Gnd, they connect output to the input
Advantage:Advantage: very efficient for implementing switches and very efficient for implementing switches and
multiplexersmultiplexers
Disadvantage:Disadvantage: signal degradation unless both NFET and PFET signal degradation unless both NFET and PFET
passgates are used in a complementary configurationpassgates are used in a complementary configuration
OPTIONALMATERIAL
11Credit: David Harris, Harvey Mudd College
Pass Transistors Transistors can be used as switches
g
s d
g
s d
OPTIONALMATERIAL
12Credit: David Harris, Harvey Mudd College
Pass Transistors Transistors can be used as switches
g
s d
g = 0
s d
g = 1
s d
0 strong 0
Input Output
1 degraded 1
g
s d
g = 0
s d
g = 1
s d
0 degraded 0
Input Output
strong 1
g = 1
g = 1
g = 0
g = 0
OPTIONALMATERIAL
13Credit: David Harris, Harvey Mudd College
Transmission Gates Single pass transistors produce degraded outputs
– pMOS good only for transmitting “1”– nMOS good only for transmitting “0”
OPTIONALMATERIAL
14Credit: David Harris, Harvey Mudd College
Transmission Gates Single pass transistors produce degraded outputs Complementary Transmission gates pass both 0 and
1 well
g = 0, gb = 1
a b
g = 1, gb = 0
a b
0 strong 0
Input Output
1 strong 1
g
gb
a b
a b
g
gb
a b
g
gb
a b
g
gb
g = 1, gb = 0
g = 1, gb = 0
OPTIONALMATERIAL
15Credit: David Harris, Harvey Mudd College
Multiplexers 2:1 multiplexer chooses between two inputs
S D1 D0 Y
0 X 0 0
0 X 1 1
1 0 X 0
1 1 X 1
0
1
S
D0
D1Y
OPTIONALMATERIAL
16Credit: David Harris, Harvey Mudd College
Transmission Gate Mux Nonrestoring mux uses two transmission gates
– Only 4 transistorsS
S
D0
D1
YS
OPTIONALMATERIAL
17Credit: David Harris, Harvey Mudd College
Gate-Level Mux Design How many transistors are needed? 20
1 0 (too many transistors)Y SD SD
44
D1
D0S Y
4
2
2
2 Y2
D1
D0S
18
Dynamic Logic, or “domino”Dynamic Logic, or “domino”Key idea: Key idea:
only use NMOS’s to compute functiononly use NMOS’s to compute function use a single PMOS to resetuse a single PMOS to reset
Advantages:Advantages: significantly fewer transistors significantly fewer transistors smaller chip area smaller chip area higher speed, lower powerhigher speed, lower power
less “loading” on wires (drive fewer transistors)less “loading” on wires (drive fewer transistors) for async:for async: no storage elements needed no storage elements needed
Disadvantages:Disadvantages: need extra control input to prechargeneed extra control input to precharge logic is typically non-inverting onlylogic is typically non-inverting only more vulnerable to noise and leakage effects more vulnerable to noise and leakage effects
19
Dynamic Logic, or “domino” Dynamic Logic, or “domino” (contd.)(contd.)Gate has 2 phases:Gate has 2 phases:
precharge (=reset):precharge (=reset): output reset to ‘0’ output reset to ‘0’ evaluate:evaluate: output computed output computed either stays ‘0’, or switches to ‘1’either stays ‘0’, or switches to ‘1’
Pull-up and pull-down must never both be simultaneously active:Pull-up and pull-down must never both be simultaneously active: ensure that data inputs are reset while gate is prechargingensure that data inputs are reset while gate is precharging or, add a or, add a “footer”“footer” device device
pull-pull-downdownnetworknetwork
controlscontrols“evaluation”“evaluation”
controlscontrols“precharge”“precharge”PCPC
datadatainputsinputs
control inputcontrol input
datadataoutputoutput
pull-uppull-upnetworknetwork
PC =0 (PC =0 (assertedasserted)) prechargeprecharge
PC =0 (PC =0 (assertedasserted)) prechargeprecharge
PC =1 (PC =1 (de-assertedde-asserted)) evaluateevaluate
PC =1 (PC =1 (de-assertedde-asserted)) evaluateevaluate
20
Outline: Several Pipeline StylesOutline: Several Pipeline Styles Classic static logic pipeline: SutherlandClassic static logic pipeline: Sutherland Recent static logic pipeline: MOUSETRAPRecent static logic pipeline: MOUSETRAP Classic dynamic logic pipeline: Classic dynamic logic pipeline:
Williams/Horowitz’ PS0Williams/Horowitz’ PS0
21
A Classic AsynchronousA Classic AsynchronousDynamic PipelineDynamic Pipeline
Williams and Horowitz’s PS0 pipeline:Williams and Horowitz’s PS0 pipeline: StructureStructure OperationOperation PerformancePerformance
22
A Classic Approach: PS0 PipelineA Classic Approach: PS0 Pipeline
Williams/Horowitz (Stanford U.) [1986-91]:Williams/Horowitz (Stanford U.) [1986-91]: successfully used in fabricated chips [Stanford ’87] [HAL ’90s]successfully used in fabricated chips [Stanford ’87] [HAL ’90s]
Implemented using “Implemented using “dynamic logic”dynamic logic”
ProcessingBlock
CompletionDetector
DataDatainin
DataDataoutout
Stage 1Stage 1 Stage 2Stage 2 Stage 3Stage 3
ackack
datadata
23
PS0 Pipeline StagePS0 Pipeline Stage
A PS0 stage consists of dynamic gates and a A PS0 stage consists of dynamic gates and a completion detector:completion detector:
Pull-downPull-downnetworknetwork
““keeper”keeper”PCPC
datadatainputsinputs datadata
outputsoutputs
Processing BlockProcessing Block
CompletionCompletionDetectorDetector
ackack
24
Dual-Rail Completion DetectorDual-Rail Completion Detector Combines dual-rail signalsCombines dual-rail signals Indicates when all bits are valid (or reset)Indicates when all bits are valid (or reset)
CCDoneDone
ORORbitbit00
ORORbitbit11
ORORbitbitnn
OROR together 2 rails per bit together 2 rails per bit Merge results using Merge results using “C-element”“C-element”
C-element:C-element:if all inputs=1, output if all inputs=1, output 1 1if all inputs=0, output if all inputs=0, output 0 0else, maintain output valueelse, maintain output value
C-element:C-element:if all inputs=1, output if all inputs=1, output 1 1if all inputs=0, output if all inputs=0, output 0 0else, maintain output valueelse, maintain output value
25Precharge Precharge Evaluate: Evaluate: another 3 eventsanother 3 eventsPrecharge Precharge Evaluate: Evaluate: another 3 eventsanother 3 eventsComplete cycle: Complete cycle: 6 events6 eventsComplete cycle: Complete cycle: 6 events6 events
indicates “done”indicates “done”
PRECHARGE N:PRECHARGE N: when N+1 completes evaluationwhen N+1 completes evaluationdelete data:delete data: afterafter next stage has copied it next stage has copied it
EVALUATE N:EVALUATE N: when N+1 completes prechargingwhen N+1 completes prechargingaccept new data: accept new data: after after next stage is emptiednext stage is emptied
PS0 ProtocolPS0 Protocol
11 22 33
44
55
66
evaluatesevaluates evaluatesevaluates evaluatesevaluates
indicates “done”indicates “done”
prechargesprecharges
indicates “done”indicates “done”
33
Evaluate Evaluate Precharge: Precharge: 3 events3 eventsEvaluate Evaluate Precharge: Precharge: 3 events3 events
NN N+1N+1 N+2N+2
26
PS0 PerformancePS0 Performance
TEVAL Evaluation Time
TPRECH Precharge Time
TDETECT Completion Detection Time
11 22 33
44
55
66
DETECTPRECHEVAL TTT 23Cycle Time =Cycle Time =
27
Summary: PS0 PipeliningSummary: PS0 PipeliningDatapaths are Datapaths are latch-free:latch-free:
dynamic gates themselves provide implicit latchesdynamic gates themselves provide implicit latches+: chip area savings+: chip area savings
+: extremely low latency+: extremely low latency
Data items kept separate by controlData items kept separate by control stage deletes data:stage deletes data: only afteronly after next stage has copied itnext stage has copied it stage accepts new data:stage accepts new data: only ifonly if next stage is emptynext stage is empty distinct data items always separated by “spacers”distinct data items always separated by “spacers”
Control is Control is extremely simple: extremely simple: each controller = single each controller = single
wirewire completion detector directly controls previous stagecompletion detector directly controls previous stage
+: chip area savings+: chip area savings
+: low control overhead+: low control overhead
28
Comparison to a Clocked PipelineComparison to a Clocked PipelineHow would you design the pipeline if you actually had a clock?How would you design the pipeline if you actually had a clock?1.1. Replace handshaking with Replace handshaking with “magic clocking”“magic clocking”
each stage gets its own clockeach stage gets its own clock successive clocks are slightly skewedsuccessive clocks are slightly skewed
essentially, clocked simulation of asynchronous handshaking!essentially, clocked simulation of asynchronous handshaking!
– – need multiple clock phases!need multiple clock phases!
2.2. Use a single clock, but insert Use a single clock, but insert latcheslatches between stages between stages latches are simple, level-sensitivelatches are simple, level-sensitive consecutive stages receive complementary clock signalsconsecutive stages receive complementary clock signals
latchlatch
CkCk
Ck’Ck’
29
Drawbacks of PS0 PipeliningDrawbacks of PS0 Pipelining1.1. Poor throughput:Poor throughput:
long cycle time: 6 events per cyclelong cycle time: 6 events per cycle data “tokens” are forced far apart in timedata “tokens” are forced far apart in time
2.2. Limited storage capacity:Limited storage capacity: max only 50% of stages can hold distinct tokensmax only 50% of stages can hold distinct tokens data tokens must be separated by at least one data tokens must be separated by at least one
spacerspacer
My Research Goals My Research Goals have beenhave been: : address both address both
issuesissues still maintain very low latencystill maintain very low latency
30
Homework #4 (due Tue Sep 18)Homework #4 (due Tue Sep 18)1.1. Enumerate ALL of the timing assumptions Enumerate ALL of the timing assumptions
inherent in Williams’ PS0 styleinherent in Williams’ PS0 style Assume all gate and wire delays can be arbitraryAssume all gate and wire delays can be arbitrary For which scenarios can there be a malfunction?For which scenarios can there be a malfunction?
2.2. Compare the cycle times of PS0 with an ideal Compare the cycle times of PS0 with an ideal clocked dynamic pipeline (slide #28)clocked dynamic pipeline (slide #28)