SiGe HBT BiCMOS Field Programmable Gate Arrays for Fast Reconfigurable Computing Bryan S. Goda...
-
Upload
aryan-gerold -
Category
Documents
-
view
240 -
download
3
Transcript of SiGe HBT BiCMOS Field Programmable Gate Arrays for Fast Reconfigurable Computing Bryan S. Goda...
SiGe HBT BiCMOS Field Programmable Gate Arrays for
Fast Reconfigurable Computing
Bryan S. Goda
Rensselaer Polytechnic Institute
Troy, New York
Agenda
• Introduction
• BiCMOS FPGA History
• SiGe HBT BiCMOS Process
• Current Mode Logic
• Xilinx 6200 FPGA Design
• Configuration Memory
• Performance Results
• Conclusions and Future Work
Current Role of SiGe
• “More Zip per Chip”
• Wireless Phones -> Watch Sized Phone
• Direct Broadcast Satellite
• Fiber-Optic Lines, Switches, and Routers
Programmable Bipolar Logic
1983: Fairchild ECL Field Programmable Logic Array• Fuse Based• 4ns Cycle Rate• High Power• Scaling Problems
1990: Algotronix 1.2uM 256 Cell Configurable Logic Array• fT 6 GHz, 200ps Gate Delay• 4 Transistor Static RAM Memory Cells• ASIC Emulation and Signal Processing• Forerunner of XC6200
SiGe Heterojunction Bipolar Transistor
• Selectively introduce Ge into the base of a Si BJT
• Smaller Base Bandgap increases e- injection, higher Beta (100)
• Higher Beta allows more heavily doped base RB (125 Ohm)
• Graded Bandgap decrease base transit time fT
SiGe HBT
• 50Ghz Process, 100Ghz process within a year (30uA at 50 Ghz)
• 5 layers of metal• Used in RPI VLSI Class• co-integrated with CMOS process
– can have HBT logic with CMOS memory– low power and high speed
EC
EV
e-
h+n+ Siemitter
p-Si
Ge
p-SiGebase
Eg,Ge(x=0)
Eg,Ge(grade)= Eg,Ge(x=Wb)- Eg,Ge(x=0)
Drift Field
n- Sicollector
Band Diagram
=0.031 ev
Dielectric ConstantSi = 11.7Ge =16.2SiGe (7.5% Ge)=12.03
Current Steering Logic
Level 1
Vcc 0 V
-250 mV
Level 2
-950 mV
-1.2 V
Level 3
-1.90 V
-2.15 V
Vee 4.5 V
Fastest Logic LevelLimited Drive Capability
Inter-block Signal LevelGood Fan-Out (10)
Clock SignalSlowest Level
Level 4 Possible
Current Steering Logic In SiGe• 13ps Transistor Switching Time (75 Ghz)
– 6ps Process Next Year
• Small Voltage Swings (250mv) vs 3.3 or 5 V– Less Power– Smaller Swing = Faster
• “Steer” Currents, Use Differential Logic– Less Switch Noise
• Less Transistors needed, Complement Signal Present
• Flip-Flops and Multiplexers Easy to Implement
Vcc
AA A
BB
Vee
Vref
A XOR B
A XOR B
O V
Level 10 -0.25 V
Level 2-0.95 -1.2V
-4.5V
A B A XOR B0 0 00 1 11 0 11 1 0
CMLXOR Logic Schematic
1 0 1 1 0 1 1 1 0
0 0 0 1 1 0 0 1 0
1 0 1 0 1 1 1 0
Alevel1
Blevel 2
A XOR B
High Speed FPGA Applications
• Real Time Image Processing- Radar- Pattern Recognition
• Digital Networks- Mobile Subscriber Equipment- Command Information Systems- High Speed Switching Nodes
• Control Systems- Guidance Systems- Reprogrammable Survivability
• Satellite Systems
Desired Image
Search Image
1. Desired Image is programmed into chip (1 pixel = 1CLB)2. Load a section of search image3. If enough pixels match, then turn found bit on4. Load another section, or reprogram with new desired image
Image Correlation
FPGA Drawbacks
• Slowdown – 200 Mhz Internal Speed down to 30-60 MHz External – Pass Transistor = Low Pass Filter
• Limited Bandwidth
• Relatively Long Configuration Times (Seconds)
• Vender Guarded Information
• More Expensive than Comparable ASIC
Equivalent Circuit from Node 3 to Node 2
M
M
M M
MM
1
On
23
4
3
2
1 4
Interconnect
4
1
23
Pass Transistor (Memory)
Pass Transistor Interconnect Modeling
Field Programmable Gate Arrays (FPGA)
• Hierarchy Level Organization (Sea of Gates)– Simple Cells (Configurable Logic Blocks)– 4x4, 16x16, 64x64 groupings– Hierarchy of routing resources at each level– I/O Blocks (external interface)
Design Parameters
• Logic Swings Levels- Based on Differential Pair Switching- Current Levels
• Redesign of the Configurable Logic Block- Take Advantage of Differential Wiring- What Parts Can be Turned off if not Used?
• Supply Levels- How Many Levels of Logic?
• Routing Resources
• CMOS Voltage Levels- Integrate CMOS into Bipolar Current Tree
a
VCC 0 V
OUT
OUT
a b b c c d d
S1 S1 S1 S1
S2 S2
Vee -3.4 V
Level 3-1.9 -2.15V
Level 2-0.95 -1.2V
Level 10 -0.25V
Replace with
Vref
Current Tree with CMOS Routing
4:1 MultiplexerLevel 1 Inputs
Level 1Output
Level 1Output
Level 2Input
Level 2Input
Level 3Input Level 3
Input
CMOSVersion
W/L 5:1
If a=1 then select Y2output = b
If a=0 then select Y3output = 0
A OR B
A and B
If a=1 then select Y2output = 1
If a=0 then select Y3output = b
X2:= a
1
0
X1:= a
X3:= b
Y2
Y3
X2:= b
1
0
X1:= a
X3:= a
Y2
Y3
Sample Logic Using Multiplexers
Non-Inverted Output
Inverted Output
X2:=b
1
0
X1:= a
X3:= a
Y2
Y3
X2:= b
1
0
X1:= a
X3:= a
Y2
Y3
Redesign of XC6200 Logic
Original XC6200 Design• Have to Track Inversions
Revised Design• Use Differential Pair Logic• Eliminate XC6200 Fast Logic• No Inversion Tracking
X1Y2X2
X3 Y3
1
0
D Q
Clk Q
FC
S
RP Multiplexer
CS Multiplexer
Clr
Original XC6200Architecture
RedesignedArchitecture
X1Y2X2
X3 Y3
1
0
D Q
Clk Q
FC
S
RP Multiplexer
CS Multiplexer
Clr Switchable
Bipolar with CMOS Routing
4:1 MuxHigh Speed Logic
2:1 MuxCMOS Control Buffer
4:1 Mux (off switchable)CMOS Control Master/Slave Latch (off switchable)
(off switchable)
CLB Layout
NSEWN4S4E4W4
NSEWN4S4E4W4
F
N S E W N4 S4 E4 W4
X1 X2CLB
X3
Outgoing CLB Routing Incoming CLB Routing
4x4 Block Boundary Routing
S Switches
E S
witch
es
N Switches
W S
wit
ches
S Switches
E S
witch
es
N Switches
W S
wit
ches
Local RoutingMagic Routing
Length 4 FastLane (4x4)Length 16 Fastlane (16x16)Chip Length Fastlane (64x64)
NSEWN4S4E4W4
NSEWN4S4E4W4
F
N S E W N4 S4 E4 W4
X1 X2CLB
X3
Local CLB Routing
N S
W FW
out
S E W F
Sout
N S E
F
Eou
t
N E W F
Nout
• Nearest Neighbor Routing• Output (F) or Local Through
Example: Route East Signal Through to Next CLBNote: Can’t Route Signal Back to Origin at this Level
New
Co
nfi
gu
rati
on
Dat
a
VEE
VEE
VSS VSS
SRAM Bits In Memory Planes CMOS to CML Buffer
decode
CLBMultiplexer
InputsVREF
Normal CMOS Memory-CML Interface
D Latch M/S40 Transistors
D Latch M/S18 Transistors
DQ
Clock
QCLK
D
CLK
Data Data
WordOut Out
RAM Cell6 TransistorsParallel Load
Memory Design
Layout of Configurable Logic Block with 2 sets of RAMRAM 2:1 Mux
8:1Mux (routing)CMOS Selects CLB (logic)
Master/Slave Latch(memory)
Circuit Elements:240 nfets122 pfets36 resistors98 npn1 HBTs16 npnhb1 HBTs
Circuit Type Buffer CMLXOR,AND,OR
MUXXOR,AND,OR
CLB
Propagation Delay 17ps 22-25ps 23-26ps 100ps
SiGe Performance
* Projected Power Levels for 7HP Process:At 50Ghz, 30 uA, 20x+ reduction in power
Power Decreasing Ideas
Date Idea Power Consumption/CLBDec 98 Original CLB 73 mWJune 99 CLB Redesign I 34 mWAug 99 CLB Redesign II 24 mWDec 99 Widlar Current Mirror
with CMOS Control, CMOS Routing 10.8 mW
Mar 00 Supply Voltage 4.5 -> 3.3V 7 mWDec 00* 7HP Process 0.3 mW
XC6200 Design Improvements
• Developed at the University of Scotland
• Inversion of Signal at Every CLB- Taken care of due to differential pair wiring
• No Pass Transistors, Use Multiplexers for Routing
• Able to turn off unused parts with CMOS controlled current mirror
• No CMOS-CML Conversion circuits needed, CMOS in current trees
• Handcrafted, dense layouts
• Context Switching
Power Delay Product
0.001
0.01
0.1
1
1998 1999 2000 2001 2002
Year
uW
/gat
e/M
hz
(log
sca
le)
PDP BiCMOS
PDP CMOS High
PDP CMOS Low
5HP
7HP8HP
A A B B C C
A A B B C C
Slow Transition
Fast Transition
Data Dependent Switching
Could Vary Signals Up to 30%
Setup Time Violations
Differential Logic hasComplement Switching In Opposite Direction
Bit Line Twisting
Future Work
• Testing
• Overall FPGA Architecture
• Scaling
• Integrate with Other Systems
• Projected Graduation May 2001, work to continue at USMA
• Power Reduction- 7HP Process
Pattern1000110010070ps ~ 7.1 GHz
Pattern2101101110070ps
Select
00011001001011011100
0001000100 AND1011111100 OR
AND OR AND OR
CLB Context Switch Example
Redesigned CLB Cell with Routing and Memory (2x)
2x24 BitRAM
Three 8-1 InputMux
CLBFour 4-1 Output
Mux
M1M2M3M4
CLB Row 4x1
Switch
Circuit Elements
1520 Nfets792 Pfets260 Resistors140 NPN1 HB576 NPN1
Memory Bus Lines N/S Input Output
Device XC6209 XC6216 XC6236 XC6264
Gate Count 9-13K 16-24K 36-55K 64-100K
Number Cells 2304 4096 9216 16384
I/O Blocks 192 256 384 512
Row x Col 48x48 64x64 96x96 128x128
XC6200 Device Family
Typical Routing Delays
Symbol Parameter XC6200 SiGe Redesign TNN Route Nearest Neighbor 1 ns 23 ps
Tmagic Route X2/X3 to Magic Out 1.5 ns 47 ps
TL4 Length 4 FastLane 1.5 ns 47 ps
TL16 Length 16 FastLane 2 ns 70 ps
TCL64 Chip-Length (64) Delay 3 ns 94 ps
~31x improvement
4x4 CLB Layout Cell
• Largest Basic Block
• Over 13,000 Transistors
• CommercialProduct Size is a 4x4 Arrayof this Cell
5 Stage Ring Oscillator
Schematic 6.36 Ghz -- 8.4mA
Parasitics 5.71 Ghz 89% 8.6mA
50oC 5.26 Ghz 82% 8.85 mA
75oC 4.87 Ghz 76% 9.1 mA
100oC 4.16 Ghz 65% 9.34 mA
125oC 3.12 Ghz 49% 9.5 mA
Speed Relative to Schematic Current