1 Clockless Computing Montek Singh Thu, Sep 6, 2007 Review: Logic Gate Families A classic...

30
1 Clockless Computing Clockless Computing Montek Singh Montek Singh Thu, Sep 6, 2007 Thu, Sep 6, 2007 Review: Logic Gate Families Review: Logic Gate Families A classic asynchronous pipeline by Williams A classic asynchronous pipeline by Williams

Transcript of 1 Clockless Computing Montek Singh Thu, Sep 6, 2007 Review: Logic Gate Families A classic...

1

Clockless ComputingClockless Computing

Montek SinghMontek SinghThu, Sep 6, 2007Thu, Sep 6, 2007

Review: Logic Gate FamiliesReview: Logic Gate Families A classic asynchronous pipeline by WilliamsA classic asynchronous pipeline by Williams

2

Review:Review:Logic Gate FamiliesLogic Gate Families

Static CMOS logic (“standard”)Static CMOS logic (“standard”) Transmission gates, or “pass-transistor” logicTransmission gates, or “pass-transistor” logic Dynamic logic, or “domino” logicDynamic logic, or “domino” logic

3

Static CMOS logic: SummaryStatic CMOS logic: SummaryAdvantages:Advantages:

output always strongly drivenoutput always strongly drivenpull-up and pull-down networks are fully-complementary;pull-up and pull-down networks are fully-complementary;

always exactly one of them is “on”always exactly one of them is “on”good immunity from noise and leakagegood immunity from noise and leakage

both inverting and non-inverting functions both inverting and non-inverting functions implementableimplementableeach gate is invertingeach gate is invertingcascade two gates together to get non-inverting logiccascade two gates together to get non-inverting logic

Disadvantages:Disadvantages: slow/big PMOS devices needed (in addition to NMOS)slow/big PMOS devices needed (in addition to NMOS)

greater chip areagreater chip areahigher power consumptionhigher power consumptionslower switching speedslower switching speed

OPTIONALMATERIAL

4Credit: David Harris, Harvey Mudd College

Complementary CMOS Complementary CMOS logic gates

– nMOS pull-down network– pMOS pull-up network– a.k.a. static CMOS

pMOSpull-upnetwork

outputinputs

nMOSpull-downnetwork

Pull-up OFF Pull-up ON

Pull-down OFF Z (float) 1

Pull-down ON 0 X (crowbar)

OPTIONALMATERIAL

5Credit: David Harris, Harvey Mudd College

Series and Parallel nMOS: 1 = ON pMOS: 0 = ON Series: both must be ON Parallel: either can be ON

(a)

a

b

a

b

g1

g2

0

0

a

b

0

1

a

b

1

0

a

b

1

1

OFF OFF OFF ON

(b)

a

b

a

b

g1

g2

0

0

a

b

0

1

a

b

1

0

a

b

1

1

ON OFF OFF OFF

(c)

a

b

a

b

g1 g2 0 0

OFF ON ON ON

(d) ON ON ON OFF

a

b

0

a

b

1

a

b

11 0 1

a

b

0 0

a

b

0

a

b

1

a

b

11 0 1

a

b

g1 g2

OPTIONALMATERIAL

6Credit: David Harris, Harvey Mudd College

CMOS Gate Design Activity:

– Sketch a 4-input CMOS NOR gate

A

B

C

DY

OPTIONALMATERIAL

7Credit: David Harris, Harvey Mudd College

CMOS Gate Design Activity:

– Sketch a 4-input CMOS NAND gate

OPTIONALMATERIAL

8Credit: David Harris, Harvey Mudd College

Conduction Complement Complementary CMOS gates always produce 0 or 1 Ex: NAND gate

– Series nMOS: Y=0 when both inputs are 1– Thus Y=1 when either input is 0– Requires parallel pMOS

Rule of Conduction Complements– Pull-up network is complement of pull-down– Parallel -> series, series -> parallel

A

B

Y

OPTIONALMATERIAL

9Credit: David Harris, Harvey Mudd College

Compound Gates Compound gates can do any inverting function Ex: (AND-AND-OR-INVERT, AOI22)Y A B C D

A

B

C

D

A

B

C

D

A B C DA B

C D

B

D

YA

CA

C

A

B

C

D

B

D

Y

(a)

(c)

(e)

(b)

(d)

(f)

10

Transmission (“Pass”) GatesTransmission (“Pass”) GatesKey Idea:Key Idea:

transistors used in a different configurationtransistors used in a different configuration when switched on: instead of connecting output to when switched on: instead of connecting output to

Vdd or Gnd, they connect output to the inputVdd or Gnd, they connect output to the input

Advantage:Advantage: very efficient for implementing switches and very efficient for implementing switches and

multiplexersmultiplexers

Disadvantage:Disadvantage: signal degradation unless both NFET and PFET signal degradation unless both NFET and PFET

passgates are used in a complementary configurationpassgates are used in a complementary configuration

OPTIONALMATERIAL

11Credit: David Harris, Harvey Mudd College

Pass Transistors Transistors can be used as switches

g

s d

g

s d

OPTIONALMATERIAL

12Credit: David Harris, Harvey Mudd College

Pass Transistors Transistors can be used as switches

g

s d

g = 0

s d

g = 1

s d

0 strong 0

Input Output

1 degraded 1

g

s d

g = 0

s d

g = 1

s d

0 degraded 0

Input Output

strong 1

g = 1

g = 1

g = 0

g = 0

OPTIONALMATERIAL

13Credit: David Harris, Harvey Mudd College

Transmission Gates Single pass transistors produce degraded outputs

– pMOS good only for transmitting “1”– nMOS good only for transmitting “0”

OPTIONALMATERIAL

14Credit: David Harris, Harvey Mudd College

Transmission Gates Single pass transistors produce degraded outputs Complementary Transmission gates pass both 0 and

1 well

g = 0, gb = 1

a b

g = 1, gb = 0

a b

0 strong 0

Input Output

1 strong 1

g

gb

a b

a b

g

gb

a b

g

gb

a b

g

gb

g = 1, gb = 0

g = 1, gb = 0

OPTIONALMATERIAL

15Credit: David Harris, Harvey Mudd College

Multiplexers 2:1 multiplexer chooses between two inputs

S D1 D0 Y

0 X 0 0

0 X 1 1

1 0 X 0

1 1 X 1

0

1

S

D0

D1Y

OPTIONALMATERIAL

16Credit: David Harris, Harvey Mudd College

Transmission Gate Mux Nonrestoring mux uses two transmission gates

– Only 4 transistorsS

S

D0

D1

YS

OPTIONALMATERIAL

17Credit: David Harris, Harvey Mudd College

Gate-Level Mux Design How many transistors are needed? 20

1 0 (too many transistors)Y SD SD

44

D1

D0S Y

4

2

2

2 Y2

D1

D0S

18

Dynamic Logic, or “domino”Dynamic Logic, or “domino”Key idea: Key idea:

only use NMOS’s to compute functiononly use NMOS’s to compute function use a single PMOS to resetuse a single PMOS to reset

Advantages:Advantages: significantly fewer transistors significantly fewer transistors smaller chip area smaller chip area higher speed, lower powerhigher speed, lower power

less “loading” on wires (drive fewer transistors)less “loading” on wires (drive fewer transistors) for async:for async: no storage elements needed no storage elements needed

Disadvantages:Disadvantages: need extra control input to prechargeneed extra control input to precharge logic is typically non-inverting onlylogic is typically non-inverting only more vulnerable to noise and leakage effects more vulnerable to noise and leakage effects

19

Dynamic Logic, or “domino” Dynamic Logic, or “domino” (contd.)(contd.)Gate has 2 phases:Gate has 2 phases:

precharge (=reset):precharge (=reset): output reset to ‘0’ output reset to ‘0’ evaluate:evaluate: output computed output computed either stays ‘0’, or switches to ‘1’either stays ‘0’, or switches to ‘1’

Pull-up and pull-down must never both be simultaneously active:Pull-up and pull-down must never both be simultaneously active: ensure that data inputs are reset while gate is prechargingensure that data inputs are reset while gate is precharging or, add a or, add a “footer”“footer” device device

pull-pull-downdownnetworknetwork

controlscontrols“evaluation”“evaluation”

controlscontrols“precharge”“precharge”PCPC

datadatainputsinputs

control inputcontrol input

datadataoutputoutput

pull-uppull-upnetworknetwork

PC =0 (PC =0 (assertedasserted)) prechargeprecharge

PC =0 (PC =0 (assertedasserted)) prechargeprecharge

PC =1 (PC =1 (de-assertedde-asserted)) evaluateevaluate

PC =1 (PC =1 (de-assertedde-asserted)) evaluateevaluate

20

Outline: Several Pipeline StylesOutline: Several Pipeline Styles Classic static logic pipeline: SutherlandClassic static logic pipeline: Sutherland Recent static logic pipeline: MOUSETRAPRecent static logic pipeline: MOUSETRAP Classic dynamic logic pipeline: Classic dynamic logic pipeline:

Williams/Horowitz’ PS0Williams/Horowitz’ PS0

21

A Classic AsynchronousA Classic AsynchronousDynamic PipelineDynamic Pipeline

Williams and Horowitz’s PS0 pipeline:Williams and Horowitz’s PS0 pipeline: StructureStructure OperationOperation PerformancePerformance

22

A Classic Approach: PS0 PipelineA Classic Approach: PS0 Pipeline

Williams/Horowitz (Stanford U.) [1986-91]:Williams/Horowitz (Stanford U.) [1986-91]: successfully used in fabricated chips [Stanford ’87] [HAL ’90s]successfully used in fabricated chips [Stanford ’87] [HAL ’90s]

Implemented using “Implemented using “dynamic logic”dynamic logic”

ProcessingBlock

CompletionDetector

DataDatainin

DataDataoutout

Stage 1Stage 1 Stage 2Stage 2 Stage 3Stage 3

ackack

datadata

23

PS0 Pipeline StagePS0 Pipeline Stage

A PS0 stage consists of dynamic gates and a A PS0 stage consists of dynamic gates and a completion detector:completion detector:

Pull-downPull-downnetworknetwork

““keeper”keeper”PCPC

datadatainputsinputs datadata

outputsoutputs

Processing BlockProcessing Block

CompletionCompletionDetectorDetector

ackack

24

Dual-Rail Completion DetectorDual-Rail Completion Detector Combines dual-rail signalsCombines dual-rail signals Indicates when all bits are valid (or reset)Indicates when all bits are valid (or reset)

CCDoneDone

ORORbitbit00

ORORbitbit11

ORORbitbitnn

OROR together 2 rails per bit together 2 rails per bit Merge results using Merge results using “C-element”“C-element”

C-element:C-element:if all inputs=1, output if all inputs=1, output 1 1if all inputs=0, output if all inputs=0, output 0 0else, maintain output valueelse, maintain output value

C-element:C-element:if all inputs=1, output if all inputs=1, output 1 1if all inputs=0, output if all inputs=0, output 0 0else, maintain output valueelse, maintain output value

25Precharge Precharge Evaluate: Evaluate: another 3 eventsanother 3 eventsPrecharge Precharge Evaluate: Evaluate: another 3 eventsanother 3 eventsComplete cycle: Complete cycle: 6 events6 eventsComplete cycle: Complete cycle: 6 events6 events

indicates “done”indicates “done”

PRECHARGE N:PRECHARGE N: when N+1 completes evaluationwhen N+1 completes evaluationdelete data:delete data: afterafter next stage has copied it next stage has copied it

EVALUATE N:EVALUATE N: when N+1 completes prechargingwhen N+1 completes prechargingaccept new data: accept new data: after after next stage is emptiednext stage is emptied

PS0 ProtocolPS0 Protocol

11 22 33

44

55

66

evaluatesevaluates evaluatesevaluates evaluatesevaluates

indicates “done”indicates “done”

prechargesprecharges

indicates “done”indicates “done”

33

Evaluate Evaluate Precharge: Precharge: 3 events3 eventsEvaluate Evaluate Precharge: Precharge: 3 events3 events

NN N+1N+1 N+2N+2

26

PS0 PerformancePS0 Performance

TEVAL Evaluation Time

TPRECH Precharge Time

TDETECT Completion Detection Time

11 22 33

44

55

66

DETECTPRECHEVAL TTT 23Cycle Time =Cycle Time =

27

Summary: PS0 PipeliningSummary: PS0 PipeliningDatapaths are Datapaths are latch-free:latch-free:

dynamic gates themselves provide implicit latchesdynamic gates themselves provide implicit latches+: chip area savings+: chip area savings

+: extremely low latency+: extremely low latency

Data items kept separate by controlData items kept separate by control stage deletes data:stage deletes data: only afteronly after next stage has copied itnext stage has copied it stage accepts new data:stage accepts new data: only ifonly if next stage is emptynext stage is empty distinct data items always separated by “spacers”distinct data items always separated by “spacers”

Control is Control is extremely simple: extremely simple: each controller = single each controller = single

wirewire completion detector directly controls previous stagecompletion detector directly controls previous stage

+: chip area savings+: chip area savings

+: low control overhead+: low control overhead

28

Comparison to a Clocked PipelineComparison to a Clocked PipelineHow would you design the pipeline if you actually had a clock?How would you design the pipeline if you actually had a clock?1.1. Replace handshaking with Replace handshaking with “magic clocking”“magic clocking”

each stage gets its own clockeach stage gets its own clock successive clocks are slightly skewedsuccessive clocks are slightly skewed

essentially, clocked simulation of asynchronous handshaking!essentially, clocked simulation of asynchronous handshaking!

– – need multiple clock phases!need multiple clock phases!

2.2. Use a single clock, but insert Use a single clock, but insert latcheslatches between stages between stages latches are simple, level-sensitivelatches are simple, level-sensitive consecutive stages receive complementary clock signalsconsecutive stages receive complementary clock signals

latchlatch

CkCk

Ck’Ck’

29

Drawbacks of PS0 PipeliningDrawbacks of PS0 Pipelining1.1. Poor throughput:Poor throughput:

long cycle time: 6 events per cyclelong cycle time: 6 events per cycle data “tokens” are forced far apart in timedata “tokens” are forced far apart in time

2.2. Limited storage capacity:Limited storage capacity: max only 50% of stages can hold distinct tokensmax only 50% of stages can hold distinct tokens data tokens must be separated by at least one data tokens must be separated by at least one

spacerspacer

My Research Goals My Research Goals have beenhave been: : address both address both

issuesissues still maintain very low latencystill maintain very low latency

30

Homework #4 (due Tue Sep 18)Homework #4 (due Tue Sep 18)1.1. Enumerate ALL of the timing assumptions Enumerate ALL of the timing assumptions

inherent in Williams’ PS0 styleinherent in Williams’ PS0 style Assume all gate and wire delays can be arbitraryAssume all gate and wire delays can be arbitrary For which scenarios can there be a malfunction?For which scenarios can there be a malfunction?

2.2. Compare the cycle times of PS0 with an ideal Compare the cycle times of PS0 with an ideal clocked dynamic pipeline (slide #28)clocked dynamic pipeline (slide #28)