Post on 18-Jan-2020
1
Challenges in VLSI DesignToward the New Millennium
Takayasu SakuraiProf. at Center for Collaborative Research, and
Institute of Industrial Science,University of Tokyo
E-mail:tsakurai@iis.u-tokyo.ac.jp
1
1 Scaling and three crises2 Power crisis3 Interconnection crisis4 Complexity crisis
IDEC ’99/10
T.Sakurai
Silicon Age
Info processing by LSI((((Si))))
Info transfer by fibers((((SiO2222))))
Year
Rel
ativ
e pr
oduc
tion
10000
1000
1960 199019801970
100
102000
Oil
Steel
Silicon
By Yoshio Nishimura
Memory
Processors
Sensors
Communicators
T.Sakurai
World-wide Semiconductor Market
1980 1990 2000 201010
100
1000
((((Billion$))))
by Starc
SemiconductorSteel
Automobile
Electric
GNP
YearYearYearYear
10000
T.Sakurai
World semiconductor market
Data : World semicon market statistics
000020202020404040406060606080808080100100100100120120120120140140140140160160160160180180180180200200200200
1986
1986
1986
1986
1987
1987
1987
1987
1988
1988
1988
1988
1989
1989
1989
1989
1990
1990
1990
1990
1991
1991
1991
1991
1992
1992
1992
1992
1993
1993
1993
1993
1994
1994
1994
1994
1995
1995
1995
1995
1996
1996
1996
1996
1997
1997
1997
1997
1998
1998
1998
1998
1999
1999
1999
1999
2000
2000
2000
2000
2001
2001
2001
2001
YearYearYearYear
Billion $
Billion $
Billion $
Billion $ MOS LogicMOS LogicMOS LogicMOS Logic
MOS uPMOS uPMOS uPMOS uP
MOS MemoryMOS MemoryMOS MemoryMOS Memory
Digital BipolarDigital BipolarDigital BipolarDigital Bipolar
AnalogAnalogAnalogAnalog
DiscreteDiscreteDiscreteDiscrete
T.Sakurai
Moore’s Law
10K
1M
100M
10G
1001970 1980 1990 2000
1M4M
16M64M
256M1G
2GD
evic
e co
unt p
er c
hip
Year
DRAMμμμμP
2010
T.Sakurai
System LSI for Next Generation Games
Clock freq. 300MHz
10M transistors
Graphics synthesizer integrate
40M tr. With embedded DRAM
Memory bandwidth 3.2GB/s
Floating operation 6.2GFLOPS/sec
3D CG 6.6M polygon/sec
MPEG2 decode
T.Sakurai
Applications of System LSI’s
PCprintergamePDA
hard disk • CDROMdisplay
communicationLAN/WAN
mobile phonewireless network
Fax • modem
digital TVdigital cameradigital movie
car navigationDVD • CD • MD
Digital consumer
PC & peripherals
Communication / network
T.Sakurai
Limit of Miniturization
Conventional I-V curve at 0.04µm (Even down to 0.014µm)
0.04µm MOSFET
0.0 0.4 0.8 1.2 1.6 2.00.00
0.21
0.42
0.63 Vg = 2.0 V
Vg = 1.6 V
Vg = 1.2 V
Vg = 0.8 V
Gate Length = 40 nm
Dra
in C
urre
nt[m
A/µ
m]
Drain Voltage [V]
0.04µm
M. Ono, M. Saito, T. Yoshitomi, C. Fiegna, T. Ohguro, and H. Iwai, "Sub-50nm gate Length N-MOSFETs with 10 nm Phosphorus Source and Drain Junctions", IEDM Technical Digest, pp. 119 -122, 1993.H. Kawaura, T. Sakamoto, Y. Ochiai, J. Fujita, and T. Baba, "Fabrication and Characterization of 14-nm-Gate-Length EJ-MOSFETs", Extended Abstracts of SSDM, pp.572-573, 1997.
T.Sakurai
Scaling law
T.Sakurai&A.Newton,"Alpha-power law MOSFET model and its application to CMOS inverter delay and other formulas",IEEE JSSC,vol25, no,2, pp.584-594, Apr. 1990.
Transistor Numbers are exponent to k (kn)
Voltage [V] -1Tr. size [x] -1Oxide thickness [t] -1Current [I~V1.3/t] - 0.3Tr. capacitance [Cg~x2/t] -1Tr. delay [Tg~CgV/I] -1.7Tr. power [Pg~CgV2/Tg] -1.3Tr. power density [p~Pg /x2] 0.7
Interconnection Local Middle Global VDD/VSS
Length [L] -1 - 0.5 0 0Width [W] -1 - 0.5 0 1Thickness [T] -1 - 0.5 0 1Height [H] -1 - 0.5 0 0Resistance [Rm~L/W/T] 1 0.5 0 -1Capacitance [Cm~LW/H] -1 - 0.5 0 1RC delay/Tr. delay[Tm~RmCm/Tg] 1.71.71.71.7 1.71.71.71.7 1.7 1.7 1.7 1.7 ----
Current density [[[[J ~J ~J ~J ~ pLWpLWpLWpLW/V/W/T]/V/W/T]/V/W/T]/V/W/T] ---- ---- ---- 0.70.70.70.7
Dc Noise [SNdc~~~~JWLRJWLRJWLRJWLRmmmm////V]V]V]V] ---- ---- ---- 1.71.71.71.7
K=2K=2K=2K=2
~~~~ [Vαααα /t]Ids =(Vgs-Vt)
2µµµµεεεε Wtox L( )
αααα
αααα = 1.3
Tr. desity [n~ 1/x2] 2
T.Sakurai
Scaling Law
Drain Source
Gate
0.2micron
Drain SourceGate
0.2micronSize 1/2
Size x1/2Voltage x1/2Electric Field x1Speed x2Cost x1/4
Power x1.6RC delay/Tr. delay x3.2Current density x1.6Voltage noise x3.2Design complexity x4
Favorable effects Unfavorable effects
T.Sakurai
Three crises in VLSI designs
Power crisis
Interconnection crisis
Complexity crisis
T.Sakurai
Ever Increasing VLSI Power(Power consumption of processors published in ISSCC)
959085800.01
0.1
1
10
100
Year
Pow
er (W
)x4 / 3years
T.Sakurai
Year
Volta
ge [V
]
Pow
er p
er c
hip
[W]
VDD
cur
rent
[A]
VDD, Power and Current Trend
International Technology Roadmap forSemiconductors (1998)
1998 2002 2006 2010 20140
0.5
1
1.5
2
2.5
0 0
200 500
Current
Power
Voltage
T.Sakurai
Necessity for Low-Power Design
Power range Concerns
< 0.1W ・・・・ Battery life Portable・・・・ PDA・・・・ Communications
~ 1W Consumer・・・・ Set-Top-Box・・・・ Audio-Visual
・・・・ Inexpensive package limit ・・・・ System heat (10W / box)
> 10W ・・・・ Ceramic package limit・・・・ IR drop of power lines
Processor・・・・ High-end MPU's・・・・ Multimedia DSP's
Typical applications (All need high-perf.)
T.Sakurai
Trend in Computer
1950 1960 1970 1980 1990 2000 2010Year
Large scale$3M
Office / middle$300K
WS$30K
PC$3000
???$300
historical$3M
Pric
e of
mai
n-st
ream
com
pute
r Vac. tube
Transistor
IC
LSI
VLSI
System LSI
DOWNSIZINGis the keyword
$1M
$1K
T.Sakurai
Computer-Communication-ConsumerE-cashingE-tradingE-banking
Cell phoneTecketingReservation
Home automationGame on netEntertainment
InternetWeb brouseWeb TVE-mail
PDAScheduleAddress book
Computer centric →→→→Communication centric, Display centric
T.Sakurai
Performance Requirements for MultimediaRequired Performance (MOPS)
FAX/Modem
10 100 1000 10000
SoundSpeech recognition
TV conf. (H.324...)MPEG1 decoding
MPEG1 encoding
MPEG2 encoding
2D/3D graphics
100000
MPEG2 decoding
HDTV decoding
Future
HDTV encoding
Present
T.Sakurai
What sets the technology trend?
NMOS CMOS
Bipolar CMOS
Cost up
Speed downNot cost nor speed but power set the technology trend.
Integration can achieve low cost and high speed as a system.
T.Sakurai
Expression for CMOS PowerP = αααα • CL • VS • VDD • fCLK Charging & discharging
+ αααα • ISC • ∆∆∆∆tSC • VDD • fCLK Crowbar current+ IDC • VDD Static current+ ILEAK • VDD Subthreshold leak current
Q=CLVS
Charge Discharge
VDD
CL • VS amount of charge loses VDD of potential-> CL • VDD • VS energy consumption per cycle
CL
αααα : Switching probabilityCL : Load capacitanceVS : Signal swingVDD : Supply voltageISC : Mean crowbar current∆∆∆∆tSC : Crowbar current durationfCLK : Clock frequencyIDC : DC current ILEAK : Subthreshold leak current
CL
VDD
T.Sakurai
Voltage waveform of CMOS inverter
CIN =10[pF] COUT= FO C IN
Target inverterID0P,ID0NI D0PIN,ID0NINvTHP
vTHN
t0 t1 tT
Inputvoltage
Outputvoltage
Short-circuit current
0 Time
N
THNDD
THNGSNDONDN
OUTOUT VV
VVIIdt
dVCα
−−
−=−≅
T.Sakurai
Short-circuit power dissipation formula
PTrPD
TPD
DDINPPDS
fovhFOvk
vgvVCfovkP
),()(2
),()(
0
0
220
αβα +=
22/
2/
)1()1()1(
)(1),( ++−−
−−+=NP
PN
TPTN
TPTNNT vv
vvf
vg αα
αα
ααα
evvvk PDPD
PD00
010
ln8.08.0
9.0)( +=
IN
OUT
CCFO =
PIND
PDP I
Ifo0
0=ND
PDr I
I
0
0=β
(Fanout)
K. Nose and T. Sakurai, "Closed-Form Expressions for Short-Circuit Power of Short-Channel CMOS Gates and Its Scaling Characteristics," ITC-CSCC (Korea), July 1998.
1)1()1()1(2),( +−−
−+=P
PP
TPTN
TPPT vv
vvh α
αα αα
T.Sakurai
Comparison between proposed formula and other formula
Verumu et al’s formula deviates from SPICE simulation fanout > 3 fanout is small
(diverge to infinity)
1 2 3 4 50
5
10
15
Fanout : FO
Shor
t-circ
uit p
ower
[pW
]
SPICEsimulation
Verumu formula
This work
ƒ=1[Hz]Tech. A
CIN=10[pF]
T.Sakurai
The change of the short-circuit power dissipation with scaling
0 1 2 3 4 5
0.1
0.2 η ηηη
P=P S/
(PD+P
S)
VDD [V]
VTH/VDD=0
VTH/VDD=0.1
VTH/VDD=0.2
VTH/VDD=0.3
Fanout=1
T.Sakurai
Voltage dependent gate cap. effect
-2 -1 0 10
50
100
150
200G
ate
capa
cita
nce
[fF]
Gate voltage : VG [V]
VDS=0V (linear)VDS=1V (saturation)I(COX)
VG
VDS
W/L=100µµµµ/0.4µµµµ
VTH=0.3V
T.Sakurai
Voltage dependent gate cap. effect
-0.2 0 0.2 0.4 0.60
1
2
3
4
VTH / VDD
Ave
rage
gat
e cu
rren
t (A
vera
ge C
gate
)
VDD=1VVDD=0.5VI(COX) FO=5
VTH=0.2FO=5
VTH=VTHOUT
DelayLarge C
-0.4 0 0.4 0.8 1.20
0.5
1
1.5
2
2.5
VTHout / VDD
Del
ay [n
sec]
inverter
COX
VDD=0.5V
T.Sakurai
Power & Delay Dependence on VDD & VTH
Power : P = pt •fCLK •CL •VDD + I0 •10 •VDD 2
V thS
(αααα=1.3)
k ・・・・ CL ・・・・ VDD
(VDD - Vth)ααααDelay =
k•QI
=
12
34
-0.400.40.8
00.2
0.4
0.6
0.8
1x 10-4
Vth (V)
VDD(V)
Pow
er (W
)
A
B
12
34
-0.400.40.8
0
1
2
3
4
5x 10
-10
Del
ay (s
)
Vth (V)VDD(V)
A B
T.Sakurai
Lowering Only Internal VDD (Example)3V
VDDINT
SwitchingDC-DC
Converter
>95%VDDEXT
VDDINT<=50%
0~3V
0~1.5V
1.5V 3Vleak
Swing Conv. 1 Swing Conv. 2
Leve
lcon
v. 2
Leve
lcon
v. 1
0~1.5V
0~3V
DC-DC Conv.
Inpu
t3V O
utpu
t 3V
DC-DC Conv.
EfficiencyEfficiency
VDDEXT
3V
1.5V
Internal VDD1.5V
T.Sakurai
Standby Power Reduction (SPR) CircuitISSCC'95 pp.318-31
VDD (2V)
VSS (0V)
VPBB (-2V)
VNWELL (2 or 4V)VNBB (4V)
Level ShifterVoltage Switch
V1 CW
M3 M4M5
CW
M2M1
V2V3
V4 VPWELL (0 or -2V)
St'by
St'by
are added to ensure reliability
• In standby mode and in IDDQ test, substrate bias is applied to increase VTH, which reduces leakage.
• In active mode, substrate bias is not applied to lower VTH, which ensures high speed.
T.Sakurai
Self-Adjusting Threshold-voltage Scheme(SATS)
CICC'94, pp.271-274
VBBN
Self-Sub-Bias
Circuit (SSB)
Leakage Sensor
leak
VGN1 Pwell
ON/OFF
low Vth →→→→ large leakage →→→→ SSB ON →→→→ deepVBB →→→→ high Vth
high Vth→→→→ little leakage →→→→ SSB OFF →→→→ shallow VBB →→→→ low Vth• control Vth to adjust leakage current• compensate Vth fluctuation
T.Sakurai
T=27
-0.1 0 0.1 0.2 0.3 0.4 0.5(|VTH .p|+VTH.n)/2 as processed (V)
VTCMOS in active mode
VTCMOS in standby mode
Conv. CMOS
I DD
.leak
(A)
1E-6
1E-5
1E-4
1E-3
1E-2
1E-1
1E+0
I DD
.leak
(A)
1E-6
1E-5
1E-4
1E-3
1E-2
1E-1
1E+0
-0.1 0 0.1 0.2 0.3 0.4 0.5
T=70
(|VTH .p|+VTH.n)/2 as processed (V)
Measured Ileak in SAT+SPR
T.Sakurai
Multi-Threshold CMOS Circuit
CMOS logic
Low-VTHD Q
Hi-VTH Hi-VTHHi-VTH
Low-VTH
St'bySt'by
MTCMOS logic MTCMOS latch
In active mode, low-VTH MOSFET’s achieve high speed. In standby mode when St'by signal is high, high-VTHMOSFET’s in series to normal logic circuits cut off leakage current.
T.Sakurai
VTCMOS / MTCMOS
p-well
VDDL
GND
n-well
Low-Vth
VDD
Low-Vth
GNDHi-Vth
VTCMOS MTCMOS
Threshold control with sub-bias On-off control of internal VDD/VSS
Principle
Merit/Demerit
VT control
St'by
o Low leakage in standby o Low leakage in standby- Needs circuit development + Conceptually easier+ Compensate Vth fluctuation - Compensate Vth fluctuation+ IDDQ test - IDDQ test+ No serial MOSFET - Large serial MOSFET
slower, larger, lower yield...o Conventional design tools o Conventional design tools+ Reuse of existing design - Special F/F's- Triple well is desirable - Two VTH's
T.Sakurai
Concept of Super Cut-off CMOS(SCCMOS)
St'by: VDD+0.4VActive: VSS
Low-VTH cut-off MOSFET
Low-VTH logic circuitVirtual VDD
VDD (0.5 - 0.8V)
pMOS insertion case
H.Kawaguchi and K.Nose, T.Sakurai, "A CMOS Scheme for 0.5V Supply Voltage with pico-Ampere Standby Current," 1998 ISSCC, Digest of Tech. Papers, pp.192-193, Feb. 1998.
T.Sakurai
NORNORNORNORNANDNANDNANDNANDInverterInverterInverterInverter
Pulse generator (EX-OR)Pulse generator (EX-OR)Pulse generator (EX-OR)Pulse generator (EX-OR)Flip-flop oscillatorFlip-flop oscillatorFlip-flop oscillatorFlip-flop oscillator
Cut-off Cut-off Cut-off Cut-off pMOSpMOSpMOSpMOS
Gate bias generatorGate bias generatorGate bias generatorGate bias generator
0.3µm, triple-metal CMOS processVTH=0.2V 100x100µm2
pumping freq=10kHz0.1µA (@VDD=0.5V)
Super Cut-off CMOS Scheme (SCCMOS)
T.Sakurai
Delay characteristics (inverter & NAND)
0 0.5 1 1.50
1
2
Tpd
[ns]
SCCMOSMTCMOS
Conventional
Measurement
Inverter 2-NAND
F.O.=3
VDD [V]
SCCMOS0.2V VTH circuit with 0.2V VTHcut-off MOSFET
MTCMOS0.2V VTH circuit with 0.6V VTHcut-off MOSFET
ConventionalAll 0.6V circuitNo cut-off MOSFET
T.Sakurai
Dynamic Leakage Cut-off
VDD
2VDD
-VDD
VSS t
VNWELL
VWL
VPWELL
Select Disselect
VNWELLDriver
Addr
ess
deco
der
VPWELLDriver
VBLm-1 VBLm-1
VDD
VSS
VNWELL
VPWELL
VBL0 VBL0
VWL
VWL+1
# of selected bit at a time
T.Sakurai
Leakage Reduction of DLC SRAM
VTH [V]0 0.1 0.2 0.3 0.410-8
10-710-610-510-410-310-210-11
I LEAK
[A]Memory capacity: 1MBit
VTH=0.25V
VDD=1V
w/o DLC
Total subthreshold leak of 1Mbit SRAM. At 1V VDD, VTH of the dormant cell is 0.25V while that of the active cell is 0V, keeping the total leakage power at 0.9mW.
T.Sakurai
Dynamic Leakage Cut-off (DLC) SRAM
Addressdecoder MCs
Wellbiasdriver
H.Kawaguchi and T.Sakurai, "A Reduced Clock-Swing Flip-Flop (RCSFF) for 63% Power Reduction," IEEE J. of Solid-State Circuits, pp.807-811, May 1998.
T.Sakurai
Area Overhead of DLC SRAM
16 32 64 1280
0.1
0.2
0.3
0.4
0.5
# of selected bit at a time
Are
a O
verh
ead
Memory capacity: 1MBit
T.Sakurai
Clustered Voltage Scaling for Multiple VDD’s
Lower VDD portion is shown as shaded
CVS StructureConventional Design
Critical Path
Level-Shifting F/F
Critical Path
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
M.Takahashi et al., “A 60mW MPEG4 Video Codec Using Clustered Voltage Scaling with Variable Supply-Voltage Scheme,” ISSCC, pp.36-37, Feb.1998.
Once VL is applied to a logic gate, VL is applied to subsequent logic gates until F/F’s to eliminate DC current paths. F/F’s restore VH.
T.Sakurai
Slave-Latch Level-Conversion F/F
CK
CK
CKCLK CK
D
VL
CK
Q
CK
VH
M1 M2
CK
T.Sakurai
Dual-VS Scheme
VH
Combi-national Logic
(VL-cell)
Combi-national Logic
(VH-cell)
Inpu
t Pad
s
Out
put P
ads
VL
VDD
DQ
CLKD
Q
VSH critical path
replica (VH-cells)CLK
DC-DC
VSLcritical path
replica (VL-cells)
DC-DC CLK
clock tree Level-Conversion Flip-Flop
T.Sakurai
Power Reduction vs. VL/VH
Optimum VL/VH is between 0.6~0.7for any kinds of path-delay distribution functions.
t
p(t)
1.0
0.8
0.6
0.4
0.2
00 0.2 0.4 0.6 0.8 1.0
Pow
er re
duct
ion
ratio
VL /VH
t
p(t)
t
p(t) t
p(t) t
p(t)
T.Sakurai
Path-delay Distribution in Dual-VS
MEF (1527 cells)
MCB (1366 cells)
VLD (3812 cells)
DMA (1493 cells)
DCT (5466 cells)
RISC (5645 cells)
before
after
t
MEC (2912 cells)
p(t)
IDCT (6227 cells)
VLC (3462 cells)
T.Sakurai
Clustered Voltage Scaling Technique
M.Takahashi et al., “A 60mW MPEG4 Video Codec Using Clustered Voltage Scaling with Variable Supply-Voltage Scheme,” ISSCC, pp.36-37, Feb.1998.
Pow
er d
issi
patio
n (m
W)
1009080706050
403020100
3.3VConventional
2.5VVS &
VTCMOS
2.5V & 1.75VDual-VS & VTCMOS
LogicF/FClock
Memory
-43%
-43%
-43%
-43%
-30%
-37%-51%±0%
Measured
Measured
T.Sakurai
Dynamic Voltage Scaling Loop
A.Chandrakasan & J.Rabaey in “Low-Power High-Speed LSI Circuits & Technology,” Realize-sha, 1998.
T.Sakurai
Temperature effects on IDS- VGS
Zero Temperature Coefficient(ZTC) point around VGS=1.0V
VZTC ≈≈≈≈ 1V
0 1 20.0
0.1
0.2
0.3
0.4
VGS [V]
I DS
[mA
]
NMOS
PMOS
Temperature increases
0ºC120ºC
VZTC
• Temp. coeff < 0
• Temp. coeff > 0
when VDD > VZTC
when VDD > VZTC
Measured
K.Kanda, K.Nose, H.Kawaguchi, and T.Sakurai,"Design Impact of Positive Temperature Dependence of Drain Current in Sub 1V CMOS VLSI's",CICC99, pp.563-566, May 1999.
T.Sakurai
Cause of positive temp. dependence of IDS
IDS ∝∝∝∝ µµµµ(T) ( VDD - VTH(T) )αααα
• αααα-power law model
µµµµ(T) = µµµµ(T0)(T / T0)-m
VTH(T) =VTH(T0) - κκκκ( T - T0 )
T T
Typical Value : αααα=1.5, m=1.5, κκκκ=2.5[mV/T]
(T = Temp. µµµµ= Mobility)
Effects of VTH and µµµµ on IDS when temp. goes up 100[K]
µµµµ effectVTH effectVDD=2.5V, VTH=0.5VVDD=1.0V, VTH=0.2V
10%55%
35% 35%
T.Sakurai
Measurement of 32bit full adder
Photograph of 32bit FA0.3µµµµm CMOS
0 0.5 1 1.5 20
1
2
3
4
Norm
aliz
ed t
pd
VDD [V]
90ºC
50ºC
20ºC Measured
T.Sakurai
Transient response of chip temperature
0 10 20 30 40 5020
60
100
140
180
Time [sec]
Tem
pera
ture
[ºC
]
VDD =0.5V, VTH = 0.2V
VDD = 3.3V, VTH = 0.5V
• Same package• Same power at room temp.
Better package is needed to avoid thermal runaway in low voltage.
K.Kanda, K.Nose, H.Kawaguchi, and T.Sakurai,"Design Impact of Positive Temperature Dependence of Drain Current in Sub 1V CMOS VLSI's",CICC99, pp.563-566, May 1999.
T.Sakurai
Careful temperature design for low-voltage
IDS and gate speed shows positive temperature dependence in VDD < 1V region.This will change the design validation process for worst conditions.
In low-VDD, low-Vth designs, temperature goes up much more than the high-VDD, high-VTH design, even if power consumption at room temperature and package are the same.
T.Sakurai
D-type CMOS
K~1 (K=0.91 in this case)
D-type leakage can not be neglected in the range VTH<-0.2V.
)()( NLEAKP
DDL
NOFFPON
DDLLH II
VCKII
VCKt−
=−
=0
)()( PLEAKN
DDL
POFFNON
DDLHL II
VCKII
VCKt−
=−
=0
2HLLH
dttt +
=
SPICEK CLVDD/IONK CLVDD/(ION -IOFF)
-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.30
0.1
0.2
0.3
0.4 10-4
10-5
10-6
10-7
10-8
10-9
10-10
Threshold voltage : VTH [V]
Del
ay [n
s]
Leak
age
pow
er d
issi
patio
n [W
]
VDD=0.5V
VTH
VTH
T.Sakurai
Power Distribution in CMOS LSI's
Clock
ASSP1
LogicMemory
I/O
ASSP2
Clock
Logic
MemoryI/O
MPU1 Clock
Logic
MemoryI/O
MPU2Clock
Logic
Memory
I/O
T.Sakurai
Power Distribution in Processor
High-end µP
ClockDatapath
Memory
I/O &Synthesized
Logic
Courtesy: Dr. Vivek Tiwari, Intel
o Synthesis for low-power is not so effective.
o Clock system is the key. In this respect, gated clock is one of the most efficient way to reduce the power in current processors.
o Gated clock is useful in reducing average power but not that effective in reducing peak power.
o Circuit / device level is important.
T.Sakurai
Reduced Clock Swing Flip-Flop
(a) RCSFFVoltage swing of CLK is reduced toVclk down to 1V.
(b) Conventional F/F
H.Kawaguchi and T.Sakurai, "A Reduced Clock-Swing Flip-Flop (RCSFF) for 63%Clock Power Reduction ," in Symp. on VLSI Circuits '97, June,1997.
2.5 2.5
5.0 5.0 5.0
2.5
5.0 5.02.5 2.5 5.0
5.02.5 2.5
5.0 5.02.5 2.5
CLK
D
Q5.02.5 Q
5.02.5
5.02.5
CLK
D
Wclk
D
Q
Q
CLK
CLK
VWELL
0.5 0.5
0.5 0.5
2.5
2.52.50.5
2.5
3.5 3.5
3.5 3.5
3.5 3.53.5 3.5
P1 P2
N1
(3.3V~6V)
PP
φφφφφφφφ
φφφφφφφφ
φφφφ φφφφ
φφφφ
φφφφ
φφφφ
φφφφ
T.Sakurai
Layout Example
CK D
24µm
15µm
20µm
15µm
(b) Conventional F/F(a) RCSFF
T.Sakurai
Delay and power comparison
1 1.5 2 2.5 30
0.5
1
Vclk [V]
Clo
ck-to
-Q D
elay
[ns]
Conv.
Wclk=6.5µm
10µm
20µm
1 1.5 2 2.5 30
50
100
150
Vclk [V]
Pow
er p
er F
/F [µ
W]
Conv.
Type A driver
Type B driver
VWELL=3.3V
VWELL=6V
Wclk=10µm
CLK
CLK
CLK
Vclk
VDD
VDD
Type A1
Type B
Type An
Type A
T.Sakurai
Modified Sense Amplifier-Based F/F
B.Nikolic et al., “Sense Amplifier-Based Flip-Flop,” ISSCC, pp.282-283, Feb.1999.
This can be used with RCSFF scheme.
T.Sakurai
Ultra Low-Voltage Operation
inverter (T=300K)
2nand(T=300K)
inverter (T=77K)
Vin (V)
Vout
(V)
50mV
25mV
100mV360mV
140mV
J.Burr&J.Shott,"A 200mV Self-Testing Encoder/Decoder using Stanford Ultra-Low-Power CMOS",ISSCC94, pp.84-85.
(Stanford Univ.)
T.Sakurai
Ultra Low-Voltage Operation
T.Sakurai
Vth, Leff, tox Optimized Low-Power MOS
M.Kakumu et al.,"Low-Voltage and Power CMOS Technology", SSDM, 1995, pp.213-
T.Sakurai
SOI Processors in ISSCC’99Paper# WP25.1 WP25.3 WP25.7 WP25.4Company IBM (East Fishkill) IBM (Essex & Austin) IBM (Rochester) SamsungTarget PowerPC 604e PowerPC 750 PowerPC Alpha
32b for Apple 64b 64bPD/FD PD PD (SIMOX) PD (SIMOX) FD (SIMOX/Unibond no dep.)Rule 0.25um 0.2um (Leff=0.12um) 0.25umInterconnect 5 Al + W local Cu 6 Cu 4 AlArea 49mm2 139mm2 209mm2# of Tr's 6.5M 34M 9.7MFreq. 500MHz 580MHz@85C, fast proc. 550MHz 600MHzVDD 1.7V 2V 1.8V 1.5V (2V I/O)Power 5.1W @2V,400MHz 24W 40WSpeed gain ov25-30% 20% 20% 30%@1.2V, 20%@1.5V SRAM
22% Ctotal reduction 12% by Cj 15-20% simple gates10-15% more Ids 15-25% by less body-bias25-40% complex gates
T.Sakurai
Hi-Speed is Low-Power
From URL: www.erniefernandez.com/html/soi.html
T.Sakurai
Advantage of SOI over Bulk CMOSo Lower CJ and CGROUND achieves 20% lower CTOTAL. Good
for hi-speed & low-power. (For interconnection limiting cases, less effective)
o 10-15% higher IDS due to lower VTH in turning-on and parasitic bipolar current (Effects reduced in VDD=0.6V)
o Lower negative body-bias effect in pass-gates and series-connected MOS’s as in NAND’s achieves higher IDS and hence hi-speed.
o s of 60mV/dec is achievable in FD and DTMOS. Lower VTH is possible with the same off-leak. (Less effective in lower VTH like 0.1V)
o Lower SER (Normal dynamic gates )
o 25-30% higher speed in total for 0.25um generation
T.Sakurai
Design Issues of PD-SOI
o History dependent delay (3-8% fluctuation)
o Pass-gate leakage by parasitic bipolar current (pull-down internal nodes)
o Lowered noise immunity in dynamic circuits (several techniques)
o Self-heating (only for circuits with DC current path. 4
o ESD protection (process/device & circuits remedies)
o Redesign efforts (higher for PD, lower for FD)
o Higher wafer cost
T.Sakurai
Dynamic Threshold MOSFET (DTMOS)
F.Assaderaghi et al.,"A Dynamic Threshold Voltage MOSFET (DTMOS) for Very Low Voltage Operation", ED Letters, vol.15, no.12, Dec. 1994.T.Fuse, et al. "A 0.5V 200MHz 1-Stage 32b ALU Using a Body Bias Controlled SOI Pass-Gate Logic," in ISSCC Dig. Tech. Papers, pp. 286-287, Feb., 1997.
T.Sakurai
Pass Transistor Logic with SOI
C C
A
B
A,A
B,B
C,C
OutOut
Pass tr. NMOS networkwith DTMOS
T.Fuse, et al. "A 0.5V 200MHz 1-Stage 32b ALU Using a Body Bias Controlled SOI Pass-Gate Logic," in ISSCC Dig. Tech. Papers, pp. 286-287, Feb., 1997.
For NMOS with VDD=0.5VGate is 0.5V → Body bias=0.5V → Vth= -0.05VGate is 0V → Body bias=0V → Vth= 0.15V
T.Sakurai
Pass Transistor Logic with SOI
0 1 2 3 4Supply Voltage (V)
0.1
1
10
100
10001000
800
600
400
200
0
Bulk Pass-gate
FrequencyPower
DTMOSPass-gate
Act
ive
Pow
er (m
W)
T.Fuse, et al. "A 0.5V 200MHz 1-Stage 32b ALU Using a Body Bias Controlled SOI Pass-Gate Logic," in ISSCC Dig. Tech. Papers, pp. 286-287, Feb., 1997.
T.Sakurai
DTMOS vs. Normal SOIDTMOS SOI
• Suppose DTMOS ≈≈≈≈ front gate + back gate• IDS/CG of back gate device < IDS/CG of front gate device.
• DTMOS needs body contact area. FD SOI can use larger W.
• Both can achieve s=60mV/dec.
• With the same leakage and area, which is really faster?
• DTMOS is good in driving large CLOAD.
• Pass transistor will show better performance with DTMOS.
T.Sakurai
CMOS Static vs. Pass-Transistor Logic
Reduced number of transistors leads to low-power, high-speed and reduced area.
CMOS static logicTr. count : 40
Full adder
Pass-tr. LogicTr. count: 28
CC
AAA AB BB B
AAA AB
BBB
C C
Sum
AA BB
AB
AB
B A
C
AB
CoutA
CC
Sum
BB
A
Sum
A
B
CoutCout
CC
A
B
T.Sakurai
History of Pass-Transistor Logic
CVSL(IBM, 1984)
DSL(Philips,
1985)CPL(HItachi, 1993)
PMOSCross
CMOSinverter
CMOSlatch
None
Logic
Load
NMOSlogic
PassTransistor
Logic
DPL(HItachi, 1993)
SRPL(Toshiba, 1994)
DCVSPG(IBM, 1993)
load
logic
Pass variable orVDD/VSS
(drain input)
Com
plem
enta
ry in
put
(gat
e in
put)
Complementaryoutput
ZZ
Sense-Amp.
SAPL(Toshiba, 1994)
T.Sakurai
Various Pass-Transistor Logic Circuits
BB
A A
BB
AA
AA
B B
A•BA•BA•BA•BA•B A•B
CPL SRPLDCVSPG
0.4µm device (full adder)
CMOS staticCPL
Tr.Count
Delay Power P•D( ns ) (Normalized)
4028
0.820.44
0.520.42
1.00 1.000.43 0.23
DCVSPG 24 0.53 0.30 0.37 0.24SRPL 28 0.48 0.19 0.21 0.13
CircuitItems
(mW/100MHz) (Normalized)E•D
T.Sakurai
Pass-Tr. Logic Synthesis with BDDBDD: Binary Decision Diagram
T.Sakurai and A.R.Newton, "Multiple-Output Shared Transistor Logic (MOSTL) Family Synthesized Using Binary Decision Diagram," Dept. EECS, Univ. of Calif., Berkeley, ERL Memo M90/21, Mar. 1990.
c00001111
1 0 0 1 00 1 1
a
0 1 1 0 11 0 0
f
a a a a a a a a
b b b bb
c
fcc
aaaaaaa
bbb
cb fa0 0 00 1 11 0 11 1 00 0 10 1 01 0 01 1 1
10010110
BDD for function f BDD for function f
SumSum
Truth table forf & f
f
T.Sakurai
x
BDD Reduction Rules
Rule 1
Collapse two nodes A1 and A2 whose right and left branch each point to the same node.
Rule 2
Eliminate a node A whose right and left branch point to the same node.
x
y y
B B
BDD
A
BDD
y y
BBDD
xA1
BDD
x x xA2
C B C
x xA
z z
T.Sakurai
BDD Reduction Example
1
c ccc
bb
f f
b bbb b
b
a a a a0
1 0
f fc c cc
b b b b bbbb
aa aa aa a
aa
a a a a a aa
1 0
ff
c c
b b
a a
c c
b b
a a
Reducing & by Rule 1
Reducing & by Rule 1
T.Sakurai
Mapping BDD to MOS Circuit
1 0
ff
c c
b b
a a
c c
b b
a a
VSSVDD
f ffff
c c
c c
b b
bb
a aa a
Mapping toMOS circuit
Introducingpass variables
f f
c c
c c
b bbb
a a
1111 →→→→ VDD
→→→→ VSS0000
x branch to VDD x branch to VSS x branch to VSS x branch to VDD
pass variable x
pass variable x
→→→→
→→→→
T.Sakurai
Approach to low-power LSI
Example of MPEG2 decodingProcessor (software)~~~~25W
DSP~~~~4W
Dedicated sytem LSI (SW/HW)~~~~0.7W
Low
-pow
er
Hig
h fle
xibi
lity
T.Sakurai
Power * Area vs. Performance
0
1
2
3
4
10 100 1000 10000Power * Area (W mm2)16
bit p
erfo
rman
ce (G
OPS
)µP + Multimedia extensionMediaprocessor for PCMediaprocessor for AV
T.Sakurai
Homogeneous vs. Heterogeneous
SpecialEngine
Homogeneous Architecture
(High flexibility)
Heterogeneous Architecture(System LSI)
(Low-power, more efficient)
MPUMPUMPUMPU
Memory
I/F, Analog
MPUMPUMPUMPU
I/F, Analog
MPUDSP
Memory
T.Sakurai
DRAM Embedding
DRAM Processor System LSI
Two orders of magnitude improvement in bandwidth and power
K.Sawada, T.Sakurai, et al, "A 72K CMOS Channelless Gate Array with Embedded 1Mbit Dynamic RAM," in Proc. CICC'88, pp.20.3.1-20.3.4, May 1988.
T.Sakurai
Neural chip
3 orders of magnitude smaller power consumption for recognition compared to software implementation
SynapesSynapesSynapesSynapesCellsCellsCellsCells
ThresholdThresholdThresholdThresholdCellsCellsCellsCellsSense AmpSense AmpSense AmpSense Amp
Word DriverWord DriverWord DriverWord Driver
WTAWTAWTAWTA
IOIOIOIO
Read WordRead WordRead WordRead Word
Write WordWrite WordWrite WordWrite Word
Read BitRead BitRead BitRead Bit________________________________Read BitRead BitRead BitRead BitWrite BitWrite BitWrite BitWrite Bit
________________________________Write BitWrite BitWrite BitWrite Bit
standard SRAMstandard SRAMstandard SRAMstandard SRAM output buffer output buffer output buffer output buffer(for Neuran (for Neuran (for Neuran (for Neuran Networking)Networking)Networking)Networking)
S.Takeuchi &T.Sakurai, ICCD’98, Oct.1998.
T.Sakurai
Energy of various operationIntegration (system LSI) is the key to low-power
Operation Energy/Op (pJ)
Add 73-2 Add 2Multiply 40Latch 1.8Internal read 36Internal write 71I/O 80External memory 16000
B.M.Gordon, E.Tsern, T.Meng,"Design of a Low Power Video Decompression Chip Set for Portable Applications," J. of VLSI Signal Processing Systems 13, pp.125-142, 1996
T.Sakurai
Software-Hardware cooperation
A.Chandrakasan, R.Amirtharajah, S.H.Cho, J.Coodman, G.Konduri, J.Kurik, W.Rabiner and A.Wong, ”Design Considerations for Distributed Microsensor systems," CICC99, pp.279-286, May 1999.
StrongArm-1100
(Clock frequency control instruction equipped, an encryption algorithm)
o Code optimization for power -> factor of 5 power reduction
o Adaptive VDD control together with frequency control -> factor of 3 further power reduction
T.Sakurai
Important technologies for low-power
Low-voltage• VTH control, multi-VTH, SOI, leakage control• VDD control, multi-VDD , DC-DC conv.• Ultra low voltage circuit (PLL, analog)• Software controlLow-swing• Bus, clockLow-C• Less # of Tr’s, fused digital-analog, pass-transistor• Low-k (air isolation) • System on a chip, memory embeddingLow- αααα ƒ• Locally synch.-globally asynch., gated clock• Low transition coding
P = αααα ƒ C Vs VDD + leak power
T.Sakurai
Lorentz Force MOS (LMOS)
Electrons deflected by By. Voltage difference between Vo1 and Vo2
Powersupply line
Gate DrainSourceN+ N+
B
e-
IP
vx
FVo1
Vo2N+
N+By
WP
K.Nose and T.Sakurai,”Micro IDDQ Test using Lorentz Force MOSFET's,” Symp. On VLSI Circuits, June 1999.
T.Sakurai
Microphotograph of LMOS
10 parallel connection
Wp : 10µµµµm 8µµµµm 5µµµµm 2µµµµm
T.Sakurai
Measured ∆∆∆∆VD dependence on IP
0 5 10
1
2
3
4
Power supply current [mA]
∆ ∆∆∆V D
[µ µµµV]
WP=8µµµµm
VDDT=VGT=2V
∆∆∆∆VD is proportional to IP.
T.Sakurai
Circuit for micro IDDQ test
It is possible to measure the current of thousands LMOS.
Shift registers are used to control the gate of LMOS.
D Q D Q D Q
CLK
VSTART
Shiftregister
VDDT Pad
Macro1
Macro2
Macro3
VDD
Pad
VD
T.Sakurai
Low-Power CMOS LSI Circuit Techniques
Low VDD
General
Bus
Data Path
Random Logic
Memory
I / O
CLpt
• gated clock
Glitch Suppress
VS VDD f CLK I SC I DC I LEAKCareful Design• design verif. by CAD
• DC-DC conv. 1)
Clock
Small Signal
• 0.25V Q-Rail 2)
• device scaling
Charge Recycling• C stacking 4)
• floorplan to reduce wire length • F/F sizing
• C stacking 6)• 1/2 swing 4)• 1/4 swing 6)• 3-state-buffer
activated after data fix 5) • exclusive bus
• latch insertion to deskew data-in 7)
Tr. Reduction• pass-transistor (CPL,SRPL,SAPL,DPL,DCVSPG )
• pass-tr. (SAPL) 8)
• parallel-ism
CAD •permutation of series-connected tr. order 9)
• library & CAD for pass-tr. logic 10)• tr. sizing 11,12)
• current switch logic
(MCML) 13)
• memory hierarchy
• reduced swing WL, BL
• MCM 17)• area pad 17)
• reduced swing I/O 18) (GTL, LVDS)
• phase modula-tion 19)
Cut Current• latch S/A 15)• dynamic termina-tion 20)
Sleep Mode• 2 typeVTH (MT-CMOS) 14)• switched source-impedance
16)
• ∆∆∆∆VTH control 3) • High VTH for standby 21)
•VTCMOS 24-27)
T.Sakurai
Reference for low-power design & System LSI
Low-power high-speed LSI design & technology
「「「「低消費電力、高速低消費電力、高速低消費電力、高速低消費電力、高速LSI技術」技術」技術」技術」
Realize publishing company, ¥56,000
Phone: +81-3-3815-8511, Fax: +81-3-3815-8529
System LSI – Applications and Technology
「「「「システムLSIーアプリケーションと技術」システムLSIーアプリケーションと技術」システムLSIーアプリケーションと技術」システムLSIーアプリケーションと技術」
Science Forum publishing, ¥48,000
Phone: +81-3-5689-5611, Fax: +81-3-5689-5622
T.Sakurai
Three crises in VLSI designs
Power crisis
Interconnection crisis
Complexity crisis
T.Sakurai
Complex interconnect
T.Sakurai
Advances in interconnection technology
Interconnection in 1985 Interconnection in 1998
T.Sakurai
Interconnect determines cost & perf.
P: Power, D: Delay, A: Area, T:Turn-around
0
20
40
60
80
100
Pow
er [%
]
Ctransistor
6
7
8
9
'95 2000 '5 '10
# of
int.
laye
rs
Year
(SIA'97)
0
20
40
60
80
100
Del
ay [%
]
(F.O.=3, Al=1mm)
0
20
40
60
80
100
Proc
ess
step
s [%
]
'95 2000 '5 '10Year
'95 2000 '5 '10Year
'95 2000 '5 '10Year
# of layers
Cint
RC delay
Int.
Tr's
Cgate
T.Sakurai
Interconnect parameters trend
0
1
2
3
4
1996 2000 2004 2008 2012Year
εεεε r
ρρρρ[Ω・ cm]
Aspect ratio
Width [x0.1μm]
Al Cu
Semiconductor Industry Association roadmaphttp://notes.sematech.org/1997pub.htm
T.Sakurai
RC delay and gate delay
1996 2000 2004 2008 201210-12
10-11
10-10
10-9
10-8
Year
Clock period
Gate delay
3mm
1mm
100µm
50µm
Del
ay (s
ec)
T.Sakurai
Repeaters
Interconnect length (cm)0.1 1 10
0.1
1
10
100a) Without repeaters
b) With repeaters
WithRepeaters
WithoutRepeaters
1000
Inte
rcon
nect
Del
ay (n
s)
T.Sakurai
Tradeoff between power and delay
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
2
2.5
3 DelayPowerPower•Delay
linerepeater PP
D,P
D (
A.U
.) Delay optimizationP: Power consumed by
repeaters (Prepeater) is 0.6 times the power consumed by interconnect (Pline)
Power • Delay optimizationD: 9% increase from opt.P: Prepeater / Pline is 26%PD: 24% decrease from
delay opt. case
T.Sakurai
The further, the lessLSI World
group
block
unitcompany
Local memoriesHierarchy
T.Sakurai
Locality in space & time
1st cache
CPU
2nd cache Main mem. Ext. mem.Latency
3ns 20ns 100ns 10msThroughput
3ns 10ns 30ns 100ns
Use of local memories
T.Sakurai
Capacitive Coupling Noise
1996 2000 2004 2008 2012Year
0
1
2
3
4
C20
C12C12
C12/C20
peak couple noise / signal voltage
ratio
T.Sakurai
Coupling noise in RC bus
0 1 20
1
Time : t / RC
Volta
ge :
V / E
1
Vp
Vp ≈ Cc/C1+ 2Cc/C
0 1 2Time : t / RC
0
1Vo
ltage
: V
/ E1
Vp
V1
V2
V1
V2
Cc / C = 1Cc / C = 1
E10 t=0
RC
CRCc
V1
V2
RC
CRCc
V1
V2Cc
E10 t=0
E10 t=0
RC
CRCc
V1
V2
RC
CRCc
V1
V2Cc
E10 t=0
Vp = 1+ 4Cc/C -1√√√√1+ 4Cc/C +1√√√√Vp ≈ 2Cc/C
2+ 3Cc/C
(Cc/C < 2)(Bus)
(Three lines)H.Kawaguchi and T.Sakurai, "Delay and Noise Formulas for Capacitively Coupled Distributed RC Lines," ASPDAC, Digest of Tech. Papers, pp.35-43, Feb. 1998.
T.Sakurai
Differential
Noise on signal lines
Noise on power supply lines
Single-ended Differential
RI
'0' is higher
• Smaller margin in single-ended circuits • Erronous discharge in diynamic circuits• In-phase noise in differential circuits (no change in margin)
T.Sakurai
Air Isolation
Before ashing
Spt. SiO2 (50 nm) InterconnectCarbon
After 450C, 2H furnace ashing
Spt. SiO2 (50 nm) InterconnectGas
T.Sakurai
Air Isolation
200
250
300
350
400
450
26.7%41.7%
20%
49.5%
Del
ay (p
s/st
age)
Isolation material
SiOF(k=3.7)
HSQ(k=2.2)
Parylene-N(k=2.7)
Gas Gas (Wire-wire) (All)
T.Sakurai
Coupling among Interconnection
Difficulty in checking setup and hold time.
00
0.2
0.4
0.6
0.8
1
0.1µm spaceLength 3mm
0→→→→1 0→→→→1 0→→→→10→→→→0 0→→→→1 0→→→→01→→→→0 0→→→→1 1→→→→0
Time (ns)
Volta
ge (A
.U.)
5 10
In-phase
Out-phase
T.Sakurai
Sense-amplified RC line
Nor
mal
ized
Vol
tage
: V(
l,t) /
ER
C
-0.20
0.20.40.60.81
0 2 4 6 8 10Normalized Time : t/RC
CLK
SA-F/F
Q
Q
D
DS
S
R
C
A
A
T.Sakurai, H.Kawaguchi and T.Kuroda, "Low-Power CMOS Design through VTH COntrol and Low-Swing Circuits," invited, 1997 International Symp. on Low-Power Electronics and Design, pp.1-6, Aug.1997.
T.Sakurai
SA-F/F (Sense-Amplifying Flip-Flop) circuits
CLK
SA-F/F
Q
Q
D
D
fP
fP
fP
fP
DVin
A
A
B
B S
S
CLK
NMOS Dynamic Differential Logic
CLK
XOR Gate
T.Sakurai
Skin Effects for Signal Lines
108 109 101010-8
10-7
10-6
10-5
Frequency (Hz)
Skin depth
Hi-end clock freq.
Cu wire
Low-end clock freq.
Skin
dep
th, i
nter
conn
ect w
idth
[m]
T.Sakurai
Skin Depth and R Increase
a/D1 10 1001
10
100
0
Da
D: skin depth
RÄ
/ R0
: Inc
reas
ed R
by
skin
effe
ct
T.Sakurai
Inductance?
・ Now RC effects surmounts LC effects because R > |jωωωωL|.
・ In the future, both of R and ωωωωLincrease (R increases more rapid?).
・ Exception in low-R lines
・ Inductive effects in wide clock lines in a fast processor are claimed to be observed in simulation.
・ Clock lines are placed on power plane to reduce inductive effects.
[1] D.A.Priore, "Inductance on Silicon for Sub-micron CMOS VLSI," Symp. on VLSI Circuits, 1993.
W / H
L : S
elf-i
nduc
tanc
e (n
H/ c
m)
L = 2 ln 6H0.8W + T
100
10
10.001 0.01 0.1 0.5
0.1
0.01
0.001T/H=
T.Sakurai
Inductive Effects
10-2
10-1
100
101
102
1996 2000 2004 2008 2012Year
ω ωωωL
/ R
Min. width (scaled)
W=1μm
W=10μm
W=100μm
T.Sakurai
Inductive Effects in Clock Lines
Board design practice is imported in LSI.
P.J.Restle & A Deutsch, “Designing the Best Clock Distribution Network,” VLSI circuits symp., pp.2-3, May 1998.
T.Sakurai
Interconnect Cross-Section and Noise
Unscaled / anti-scaled・ Clock・ Long bus・ Power supply
Scaled interconnect・ Signal
1V 15W -> 15A current5% noise -> 0.05V noise -> 3mΩΩΩΩ sheet R -> 10µm thick AlArea pad + package, or thick layer on board is needed.
T.Sakurai
Possible solutions for interconnect issuesArchitecture
• Hierarchical architecture, local memories (10~)
Circuit• Repeater (5)• Line width sizing (10)• Sense amplifier (5)• Interconnection pipelining (10)• Differential circuit (10)
Device / Process• Low-r (Cu 1.3 (10 for EM)), Low-ε (F 1.1, polymer 2, air 4)• Multi-layer interconnection (un/anti-scaled layers 100)• Area pads + thick package / board layers (10)
CAD• R, C extraction, fast simulation (1000)• Optimization (repeater insertion...)
T.Sakurai
Three crises in VLSI designs
Power crisis
Interconnection crisis
Complexity crisis
T.Sakurai
VLSI Design in 2010
Designing a map of 10m wide roadsfor a world atlas
T.Sakurai
Complexity vs. Productivity
System LSI design complexity increases faster than productivity. (http://notes.sematech.org/97melec.htm)
2000 2002 2004 2006 2008 2010 20121
10
100
1000Design complexity
Productivity improved with lots of development
Productivity improved with current rate
T.Sakurai
Coping with complexity crisis
MPU Core
Cache
ROMRAM
MPEG CoreUSB Core
ProprietaryLogic
IP (A inc.)IP (B univ.)IP (C inst.)IP (D semi.)
IP ; CPU, DSP, memories, analog, I/O, logic..HW/FW/SW
• Re-use and sharing of IP’s• Design at high abstraction
T.Sakurai
Hot design topics initiates CAD tools
S/W, H/W Co-design
Behavioral
RTL
Logic
Circuit
Physical (deep submicron)
New dimensions• LSI/package/board• Power• RC delay• Signal integrity• Interconnect reliability• Noise• IR drop• Distribution of parameters• Memory embedding• Analog-digital mix...
New dimensions• LSI/package/board• Power• RC delay• Signal integrity• Interconnect reliability• Noise• IR drop• Distribution of parameters• Memory embedding• Analog-digital mix...
Total system design
T.Sakurai
LSI in 2014
Year Unit 1999 2014 FactorDesign rule µm 0.18 0.035 0.2Tr. Density /cm2 6.2M 390M 30Chip size mm2 340 900 2.6Tr. Count per chip (µP) 21M 3.6G 170DRAM capacity 1G 1T 256Local clock on a chip Hz 1.2G 17G 14Global clock on a chip Hz 1.2G 3.7G 3.1Power W 90 183 2.0Supply voltage V 1.5 0.37 0.2Current A 60 494.6 8Interconnection levels 6 10 1.7Mask count 22 28 1.3Cost / tr. (packaged) µcents 1735 22 0.01Chip to board clock Hz 500M 1.5G 3.0# of package pins 810 2700 3.3Package cost cents/pin 1.61 0.75 0.5
T.Sakurai
Chip in 2014
Sensors
Micro-actuators (for display)
• Sensors/actutors on chip
• 0.035µm 3.6G Si FET’s with VTH & VDD control
• Locally synchronous 17GHz clock, globally asynchronous
• Chip / Package / Board system co-design
Lots of IP’s(µP, memAnalog, ...)
ProgrammableArray ofMacros
T.Sakurai
Summary
Scaling law indicates power, interconnection and complexity crises.
Low-voltage + threshold control and less-waste design for low-power
Process, design guidelines and local memory for interconnection issues
Design reuse and sharing + software programmability for complexity