Matthew W. Ernest Electrical, Computer and Systems Engineering Dept.
description
Transcript of Matthew W. Ernest Electrical, Computer and Systems Engineering Dept.
![Page 1: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept.](https://reader035.fdocuments.net/reader035/viewer/2022070405/56813d7d550346895da75b84/html5/thumbnails/1.jpg)
mwe/PHD/1
Critical ALU Path Optimization and Implementation in a
BiCMOS Process for Gigahertz Range Processors
Matthew W. Ernest
Electrical, Computer and Systems Engineering Dept.
Rensselaer Polytechnic Institute
![Page 2: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept.](https://reader035.fdocuments.net/reader035/viewer/2022070405/56813d7d550346895da75b84/html5/thumbnails/2.jpg)
mwe/PHD/2
Overview
• Motivation
• Parallel Prefixes and Carry Types
• HBT Digital Circuits
• Pseudo-carry Adder
• Future Directions
![Page 3: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept.](https://reader035.fdocuments.net/reader035/viewer/2022070405/56813d7d550346895da75b84/html5/thumbnails/3.jpg)
mwe/PHD/3
Motivation
“Speed has always been important otherwise one wouldn't need the computer.” -Seymour
Cray
• Ubiquity
• Simplicity
• Complexity
![Page 4: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept.](https://reader035.fdocuments.net/reader035/viewer/2022070405/56813d7d550346895da75b84/html5/thumbnails/4.jpg)
mwe/PHD/4
Parallel Prefixes
• The set of problems covering sequences of operations where terms are added in order to the result of the previous operation
• Carry computation is an application of parallel prefix theory
Given: x0 x1 x2 ... xk
Find: x0 x0 x1 x0 x1 x2 ... x0 x1 x2... xk
![Page 5: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept.](https://reader035.fdocuments.net/reader035/viewer/2022070405/56813d7d550346895da75b84/html5/thumbnails/5.jpg)
mwe/PHD/5
Carry types: Carry Select• Compute possible results in
parallel• Select when actual carry-in
available• Requires internal carry for
blocks, e.g. ripple• Delay: O(f(n/b) +b), min.
O(n1/2)• Area: O(f(n/b)b+b), approx.
2n • Affected by block sizing
0
1
0
1
![Page 6: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept.](https://reader035.fdocuments.net/reader035/viewer/2022070405/56813d7d550346895da75b84/html5/thumbnails/6.jpg)
mwe/PHD/6
Carry Types: Carry look-ahead
• Carry-out can be “generated” at current position or carry-in “propagated”
• Delay: O(1)• Area: O(n2)• High fan-in/fan-out
![Page 7: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept.](https://reader035.fdocuments.net/reader035/viewer/2022070405/56813d7d550346895da75b84/html5/thumbnails/7.jpg)
mwe/PHD/7
Carry Types: Block carry look-ahead
• A block propagates a carry if all bits in the block propagate a carry
• A block generates a carry if a bit generates a carry and all succeeding bits propagate
• Delay: O(log n)
• Area: O(n log n)
![Page 8: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept.](https://reader035.fdocuments.net/reader035/viewer/2022070405/56813d7d550346895da75b84/html5/thumbnails/8.jpg)
mwe/PHD/8
Block carry look-ahead trees
![Page 9: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept.](https://reader035.fdocuments.net/reader035/viewer/2022070405/56813d7d550346895da75b84/html5/thumbnails/9.jpg)
mwe/PHD/9
Carry vs. Pseudo-carryCout=Gn+ Pn• Gn-1 +…+Pn• Pn-1• ... P0• Cin
If G=A•Band P=A+Bthen
G=G•PCout= Pn•Gn+ Pn• Gn-1 +…+Pn• Pn-1• ... P0• Cin
Cout= Pn(Gn+ Gn-1 +…+Pn-1• ... P0• Cin)Cout= Pn•Hn
Hn =Gn+ Gn-1 +…+Pn-1• ... P0• Cin
![Page 10: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept.](https://reader035.fdocuments.net/reader035/viewer/2022070405/56813d7d550346895da75b84/html5/thumbnails/10.jpg)
mwe/PHD/10
Carry vs. Pseudo-carry
• Redundant terms create factorization opportunities
• Factorization moves terms from critical paths to non-critical paths
• Multiple paths can be parallelized
• Products with fewer terms lead to implementations with smaller, faster gates
![Page 11: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept.](https://reader035.fdocuments.net/reader035/viewer/2022070405/56813d7d550346895da75b84/html5/thumbnails/11.jpg)
mwe/PHD/11
Block Generate:Gi•j
0= Gij + Pi
jGij-1i + … + Pi
jPij-1iPi
j-2i•••Gi0
If G=A•Band P=A+Bthen
G=G•PGi•j
0= PijGi
j + PijGi
j-1i + … + PijPi
j-1iPij-2i•••Gi
0
Gi•j0= Pi
j(Gij + Gi
j-1i + … + Pij-1iPi
j-2i•••Gi0)
Hi•j0= Gi
j + Gij-1i + … + Pi
j-1iPij-2i•••Gi
0
Deriving Block Pseudo-carry from Block Carry Look-ahead Terms
• Pseudo-carries can be generated in blocks like carries
![Page 12: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept.](https://reader035.fdocuments.net/reader035/viewer/2022070405/56813d7d550346895da75b84/html5/thumbnails/12.jpg)
mwe/PHD/12
H2s= G1
s+1 + G1s
Hi+js= Hj
s+i + Ijs+i-1•Hi
s
Hi+j+ks= Hk
s+I+j + Iks+I+j-1•Hj
s+i + Iks+I+j-1• Ij
s+i-1•His
Ip+qt= Iq
t+p•Ipt
Ip+q+rt= Ir
t+q+p•Iqt+p•Ip
t
Generalized Pseudocarry Equations
![Page 13: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept.](https://reader035.fdocuments.net/reader035/viewer/2022070405/56813d7d550346895da75b84/html5/thumbnails/13.jpg)
mwe/PHD/13
Sn=AnBnCn-1
IfTn=AnBn
Cm= Pm•Hm
thenSn=TnPn-1Hn-1
Generating Sums Using Pseudocarry
• Sum with pseudo-carry no more complex than sum with carry
• Other look-ahead features still apply, e.g. Han-Carlson “every other carry”
![Page 14: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept.](https://reader035.fdocuments.net/reader035/viewer/2022070405/56813d7d550346895da75b84/html5/thumbnails/14.jpg)
mwe/PHD/14
Adder comparision
Bits Rip
ple
CSelA B C CLA
PC
LA
32 32 12 12 9 6 5
64 64 20 16 12 7 6
![Page 15: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept.](https://reader035.fdocuments.net/reader035/viewer/2022070405/56813d7d550346895da75b84/html5/thumbnails/15.jpg)
mwe/PHD/15
HBT Digital Circuits
• Exponential I/V relationship leads to high gain and fast switching
• Vertical arrangement allows critical dimensions to be smaller with tighter tolerances
• Traditionally high DC power consumption: compare increasing leakage and switching currents for FETs
![Page 16: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept.](https://reader035.fdocuments.net/reader035/viewer/2022070405/56813d7d550346895da75b84/html5/thumbnails/16.jpg)
mwe/PHD/16
Current Steering Logic• Constant current source equals
combined emitter currents• Ratio of current through each
transistor is exp. function of base voltage
• Difference in currents at collector converted to difference in voltage on pull-up resistors.
![Page 17: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept.](https://reader035.fdocuments.net/reader035/viewer/2022070405/56813d7d550346895da75b84/html5/thumbnails/17.jpg)
mwe/PHD/17
Single-ended vs. Double-ended
• Limited to simple functions
• Large fan-in
• Any function of inputs• Fan-in limited by supply
voltage
![Page 18: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept.](https://reader035.fdocuments.net/reader035/viewer/2022070405/56813d7d550346895da75b84/html5/thumbnails/18.jpg)
mwe/PHD/18
Look-ahead gate w/ fully differential logic
Hn
In
Hn-1 Hn-1
In
Hn
Hn Hn
In In
Hn-1 Hn-1
Hn-2 Hn-2
In-1 In-1
![Page 19: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept.](https://reader035.fdocuments.net/reader035/viewer/2022070405/56813d7d550346895da75b84/html5/thumbnails/19.jpg)
mwe/PHD/19
Mixed input look-ahead gates
Hn
In
Hn-1
In
HnVr Vr • In(Hn+ Hn-1) + In•Hn
• Hn+ In•Hn-1
• Two series-gated levels for three inputs
![Page 20: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept.](https://reader035.fdocuments.net/reader035/viewer/2022070405/56813d7d550346895da75b84/html5/thumbnails/20.jpg)
mwe/PHD/20
Hn Hn
InIn
Hn-1 Hn-1Hn-2
In-1 In-1
Hn
Mixed input look-ahead gates
• In In-1(Hn+ Hn-1 + Hn-2) + In
In-1(Hn+ Hn-1) + In• In-1• Hn
• Hn+ In•Hn-1 + In• In-1• Hn-2
• Three series-gated levels for five inputs
![Page 21: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept.](https://reader035.fdocuments.net/reader035/viewer/2022070405/56813d7d550346895da75b84/html5/thumbnails/21.jpg)
mwe/PHD/21
Pseudocarry BlocksH2
sH2
s H2s
H2s H2
sH2
s H2s
H2s H2
sH2
s H2s
H2s H2
sH2
s H2s
H2s H2
sH2
s H2s
H2s H2
sH2
s H2s
H2s H2
sH2
s H2s
H2s H2
sH2
s H2s
H2s
H6s
H6s H6
sH6
s H6s
H6s H6
sH6
s H6s
H6s
H18s
H18s H14
sH14
s
H32s
H32s
![Page 22: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept.](https://reader035.fdocuments.net/reader035/viewer/2022070405/56813d7d550346895da75b84/html5/thumbnails/22.jpg)
mwe/PHD/22
Pseudocarry Tree Oscillator
B A
Cin
Cout
32
031
1
1 Select
![Page 23: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept.](https://reader035.fdocuments.net/reader035/viewer/2022070405/56813d7d550346895da75b84/html5/thumbnails/23.jpg)
mwe/PHD/23
Carry Tree High-speed Output
2 x 165 ps
![Page 24: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept.](https://reader035.fdocuments.net/reader035/viewer/2022070405/56813d7d550346895da75b84/html5/thumbnails/24.jpg)
mwe/PHD/24
Breakdown of measured delay
Devices
71%
Wire C
12%
Temperature
6%
Resistor model
11%
Total measured delay = 165 ps
![Page 25: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept.](https://reader035.fdocuments.net/reader035/viewer/2022070405/56813d7d550346895da75b84/html5/thumbnails/25.jpg)
mwe/PHD/25
Loaded vs. unloaded toggling
• At design time, fT peak at 1.2mA/um2 but limit at 2mA/um2
• For some devices, max. frequency when driving load can occur above fT peak current
• Models supported this, no reason at time to not believe them
• However, models are never qualified above fT peak current!
![Page 26: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept.](https://reader035.fdocuments.net/reader035/viewer/2022070405/56813d7d550346895da75b84/html5/thumbnails/26.jpg)
mwe/PHD/26
Loaded vs. unloaded toggling
0.00E+00
1.00E-11
2.00E-11
3.00E-11
4.00E-11
5.00E-11
6.00E-11
7.00E-11
8.00E-11
0.00E+00 5.00E-04 1.00E-03 1.50E-03 2.00E-03 2.50E-03
Tail Current
Bu
ffer
Del
ay
![Page 27: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept.](https://reader035.fdocuments.net/reader035/viewer/2022070405/56813d7d550346895da75b84/html5/thumbnails/27.jpg)
mwe/PHD/27
Resistor Model Effects9805A 99B
Simulated Fabricated
Pull-up 444 528
Tail 1000 1091
![Page 28: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept.](https://reader035.fdocuments.net/reader035/viewer/2022070405/56813d7d550346895da75b84/html5/thumbnails/28.jpg)
mwe/PHD/28
Model parameter variation
0
50
100
150
200
250
300
350
400
450
500
9708A 9802 9805 1999B v2.3
Design Kit
Par
amte
r val
ue RB (ohms)
RE (ohms)
RC (ohms)
DARPA02 Design DARPA02 Fabrication
![Page 29: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept.](https://reader035.fdocuments.net/reader035/viewer/2022070405/56813d7d550346895da75b84/html5/thumbnails/29.jpg)
mwe/PHD/29
Cadence internal parasitic methods
• Approximates all capacitance as polynomial function of distance between conductors
• Cannot extract RC and capacitance between conductors at the same time: killer for differential wiring!
• Convenient, but window of usability small and shrinking
![Page 30: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept.](https://reader035.fdocuments.net/reader035/viewer/2022070405/56813d7d550346895da75b84/html5/thumbnails/30.jpg)
mwe/PHD/30
QuickCap capacitance extraction
• Field solving with floating random walk method
• Accuracy almost wholly a function of run time: 4x run time give ½ error
• Random walks independent, near perfect parallelization
![Page 31: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept.](https://reader035.fdocuments.net/reader035/viewer/2022070405/56813d7d550346895da75b84/html5/thumbnails/31.jpg)
mwe/PHD/31
Comparing parasitic extraction
0
5
10
15
20
25
30
35
40
45
50
0 200 400 600 800 1000 1200
Length (um)
Dela
y (
ps) Qcap RC
RCNET
PCAP
Calc RC
![Page 32: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept.](https://reader035.fdocuments.net/reader035/viewer/2022070405/56813d7d550346895da75b84/html5/thumbnails/32.jpg)
mwe/PHD/32
Cadence/QuickCap Design Flow• Extract physical data
from layout
• Compute RC with QuickCap
• Extract netlist from schematic
• Combine to simulate with Spectre
![Page 33: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept.](https://reader035.fdocuments.net/reader035/viewer/2022070405/56813d7d550346895da75b84/html5/thumbnails/33.jpg)
mwe/PHD/33
Partial manual extraction with QuickCap
• Identify main wires of oscillation paths: approx. dozen pairs
• QuickCap extraction for each wire-ground cap. and cap. between pair
• Add RC-ladder for each pair by hand to schematic and simulate
![Page 34: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept.](https://reader035.fdocuments.net/reader035/viewer/2022070405/56813d7d550346895da75b84/html5/thumbnails/34.jpg)
mwe/PHD/34
Simulation with Parasitic Extraction
Feedback path
w/o parasitics
(ps)
QuickCap parasitic cap.
(ps)
COEFGEN parasitic cap.
(ps)
Raphael parasitic
cap.(ps)
QuickCap parasitic
RC(ps)
Cin 100 121 128 131 135
A1 103 123 130 129 137
A31 108 127 129 132 141
![Page 35: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept.](https://reader035.fdocuments.net/reader035/viewer/2022070405/56813d7d550346895da75b84/html5/thumbnails/35.jpg)
mwe/PHD/35
Pseudo-carry Tree configured as Ring Oscillator
B A
Cin
Sel0Sel1
Cout
32 30 1
1
1
00...00 11...11
![Page 36: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept.](https://reader035.fdocuments.net/reader035/viewer/2022070405/56813d7d550346895da75b84/html5/thumbnails/36.jpg)
mwe/PHD/36
SMI00 Test Structure Layout
![Page 37: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept.](https://reader035.fdocuments.net/reader035/viewer/2022070405/56813d7d550346895da75b84/html5/thumbnails/37.jpg)
mwe/PHD/37
SMI00 Test Structure
![Page 38: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept.](https://reader035.fdocuments.net/reader035/viewer/2022070405/56813d7d550346895da75b84/html5/thumbnails/38.jpg)
mwe/PHD/38
Carry Tree High-speed Outputs
16 x 146 ps
![Page 39: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept.](https://reader035.fdocuments.net/reader035/viewer/2022070405/56813d7d550346895da75b84/html5/thumbnails/39.jpg)
mwe/PHD/39
Comparisons of published adders
Reference Type Size Gate Del. TimeZIMM96 Carry 32 5 -STEL96 Adder 64(32) 12.5(12?) -WANG97 Adder 32 3 2.7nsCHAN98 Adder 64(32) 27(19.5) -SILB98 Fixed 64 - 550 psAIPP99 Adder 64 - 660 psSAGE01 Adder 32[16x2] - <500psMATH01 Adder 64 - 482 psSTAS01 Adder 64 - 440 psLEE02 Adder 64 900 psVANA02 ALU 32 8 <200 ps
![Page 40: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept.](https://reader035.fdocuments.net/reader035/viewer/2022070405/56813d7d550346895da75b84/html5/thumbnails/40.jpg)
mwe/PHD/40
Cascode Output Stage• Eliminates Miller
capacitance between input and output
• Reduces Cjc and Cjs on outputs
• Shortens rise time, but increases delay
![Page 41: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept.](https://reader035.fdocuments.net/reader035/viewer/2022070405/56813d7d550346895da75b84/html5/thumbnails/41.jpg)
mwe/PHD/41
Dotted Emitter/Collector
![Page 42: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept.](https://reader035.fdocuments.net/reader035/viewer/2022070405/56813d7d550346895da75b84/html5/thumbnails/42.jpg)
mwe/PHD/42
“Wide/Short” gate with dotted emitter/collector
![Page 43: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept.](https://reader035.fdocuments.net/reader035/viewer/2022070405/56813d7d550346895da75b84/html5/thumbnails/43.jpg)
mwe/PHD/43
“Wide/Short” gate with dotted emitter/collector
• Shorter trees lead to lower supply voltages• Wider trees reduce ratio of emitter-followers to
terms computed, lowering total current• More inputs per look-ahead gate means fewer
look-ahead levels• Elimination of single-ended inputs on critical H
signals allow faster switching with reduced swing
![Page 44: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept.](https://reader035.fdocuments.net/reader035/viewer/2022070405/56813d7d550346895da75b84/html5/thumbnails/44.jpg)
mwe/PHD/44
Even wider look-ahead gate
Width limited by• Accumulated Cjc and Cjs of dotted-and node• Saturation vs. breakdown• Fan-out loading from inputs and interconnect
![Page 45: Matthew W. Ernest Electrical, Computer and Systems Engineering Dept.](https://reader035.fdocuments.net/reader035/viewer/2022070405/56813d7d550346895da75b84/html5/thumbnails/45.jpg)
mwe/PHD/45
Conclusions
• 32-bit addition depth reduced to 5 gates fabricated. 4 and 3 gate depth circuits designed.
• Gate to compute 3-way look-ahead fabricated. Up to 8-way look-ahead designed.
• Carry delay for 32-bit addition measured at 146ps.• QuickCap technology file for 5HP brings
simulated results within 11% of measured.