FEMTO-JOULE SWITCHING Review of Low Energy Approaches for the Nano Era Jabulani Nyathi Washington...
-
Upload
roger-boone -
Category
Documents
-
view
217 -
download
0
description
Transcript of FEMTO-JOULE SWITCHING Review of Low Energy Approaches for the Nano Era Jabulani Nyathi Washington...
FEMTO-JOULE SWITCHINGReview of Low Energy
Approaches for the Nano Era Jabulani Nyathi
Washington State UniversityValeriu Beiu
Washington State UniversitySnorre, Aunet
University of Oslo, Norway
With credits toJoel Birnbaum (HP), Hugo De Man (IMEC/KUL), Kaushik Roy (Purdue), Mark Lundstrom (Purdue), Vojin G. Oklobdzija (UCDavis), Takayasu Sakurai (University of Tokyo), Tadahiro Kuroda (Keio University), Anantha Chandrakasan (MIT), Richard Brown (Univ. of Utah), and ITRS Roadmap
2
Motivation
3
Where are we going?Penetration
19701960 1980 20001990
Utility: The ability, capacity or power…to satisfy the needs or gratify the desires of the majority or of the human race as a whole (Oxford English Dictionary)
Appliance: a thing applied as a means to an end (Oxford English Dictionary)
Toward pervasive information systems
Micros
Mainframes
Batch computingand timesharing
Minis
Distributedcomputing
Networkedpersonal computing
Open systems ofclients and servers
Cooperativecomputing
Informationappliances
Informationutility
4
How to get there? The very big picture
1960 70 80 90 2000Year
2010
RT-ops FSM
asp
FPGA
µC
dspP
µPASIP
Hardware
ASIC
ICFilters AD/DA
RF
memorygate
opamp
Software
Design Software
embedded CServices
OOcC++
Network
VHDL
IP
System on
Silicon Board
5
How to get there? The very small picture
1990 2016
ID(on)
ID(off)
0.00001 A
1000 A
10 A
10X increaseper technology node
10 nm scale MOSFETs
1.2 nm
6
As the electrons vanish
Information is a physical entity
– Rolf Landauer, IBM
Therefore, computation is a physical process
101 100 10-1102
104
106
108
Number of chip components
Feature size (microns)
1010
1012
1018
1014
1016
10-2 10-3
Scaling of electronic devices
Classical Age
Historical Trend
SIA Roadmap2010
CMOS
19952000
2005
1970
1980
1990
1985
Vanishing electrons
1990 1995 2000 2010 2015 202010-1
100
101
102
103
104Electrons per device
2005Year
(Transistors per chip)
(16M)(4M)
(256M)(1G)
(4G)(16G)
(64M)
Power cost of information transfer?
P = nkBT 2
PkBTdcn
= power= Boltzman constant= temperature= transmission distance= speed of light= operating frequency= number of parallel
operations
dc
7
Power PowerPower
8
The trend: power, VDD, and current
Year
Volta
ge [V
]
Pow
er p
er c
hip
[W]
VDD
cur
rent
[A]
1998 2002 2006 2010 20140
0.5
1
1.5
2
2.5
0 0
200 500
Current
Voltage
Power
9
How should we deal with power and speed?
Device leveldevices must have low threshold voltages, reduced parasitic capacitances orbetter yet new devices
Examples include fully and partially depleted silicon-on-insulator CMOSNovel nano devices (e.g., single electron transistors, molecular, spin transistor, etc.)
Gate levelLogic design styles that include
Standard CMOS Domino logicDifferential logic familiesPseudo nMOS and many moreThreshold logic
Circuit levelClock gating, current sensing, etc
Module levelWill inherit the gains achieved at device, circuit and gate levels and manage these by employing innovative architectures (e.g., reduce switching activity).
Chip levelAsynchronous communication, optical interconnects
10
Sources of power dissipationPower has been a secondary design issue to speed
Device miniaturization and voltage scaling have led to:
Fast switching speeds, High density designs,High leakage currents and
ultimately increased power dissipation.
In deep sub-micron (i.e. nano), the conflicting issues of high speed and low power are becoming even more prominent.
11
Past techniques for power reduction
Voltage/frequency scalingLimited by technology. Not possible below a certain feature-size.
Architectural adaptationShut off portions of core when not neededDynamic speculation control Reconfigurable caches
Limitations:Very few choices to makeOnly dynamic power being savedHas associated overhead
12
TransMeta Example
13
Expression for average power
Sufficient details of the currents drawn must be studied to allow for a detailed power analysis. The average total power in digital CMOS circuits can be described by:
Ptotal = Pdynamic + Pshort_circuit + Pstatic
The dynamic power component and methods to manage it, have seen a fair share of analysis.
14
Power component expressions
Each component of the average power can be analyzed further as follows:
Pdynamic = α • VDD• Vswing• CL • fCLK
With VDD being the supply voltage, Vswing the output/internal node voltage swing, CL the load capacitance and f the switching rate of the output and α, the activity factor.
Pshort_circuit = α • Isc_ave• Vswing
Isc_ave is the average short circuit current over a period. α is included because the short circuit currents occur only when the outputs switch.
15
The static power … becomes important!
The third component of the average power equation is:
Pstatic = Psub_leakage + PDC
Where Psub_leakage is due to sub-threshold leakagePDC is due to DC current
For nano-electronics it is expected that the static component of power will be comparable to the dynamic power dissipation
Standby power (Psub_leakage) – a component of static power will be the culprit due to scaling.
16
Example: Reducing dynamic power
Pdynamic = CL VDD Vswing fCLK
Reduce switching activity:•Conditional clock•Conditional precharge•Switching-off inactive blocks•Conditional execution
Run it slower:•Use parallelism•Less pipeline stages•Use double-edge flip-flop
Technology scaling:•The highest win•Thresholds should scale•Leakage starts to byte•Dynamic voltage scaling
Reducing the active load:•Minimize the circuits•Use more efficient design•Charge recycling •More efficient layout
17
Is there an optimal design point ?
18
Power dissipation and circuit delay
Power : P = pt •fCLK •CL •VDD + I0 •10 •VDD 2
V th S
(=1.3)
k • CL • VDD
(VDD - Vth)Delay =
k•QI
=
12
34
-0.400.40.8
00.2
0.4
0.6
0.8
1x 10
-4
Vth (V)
VDD(V)
Pow
er (W
)
A
B
12
34
-0.400.40.8
0
1
2
3
4
5x 10
-10
Del
ay (s
)
Vth (V)VDD(V)
A B
19
Power-delay product, energy-delay product
Power-delay product is a misleading metric, as it favors a processor that operates at lower frequencyEnergy-delay is adequate, but energy delay2 should be used instead
Lowest Voltage – Highest Threshold –
no optimum
20
Energy-delay2
21
Lowering VDD to achieve ultra-low power
Energy consumption isproportional tothe square of VDD.
VDD should be loweredto the minimum levelwhich ensuresthe real-time operation.
Normalized workload0.0 0.2 0.4 0.6 0.8 1.0
Nor
mal
ized
pow
er
0.0
0.2
0.4
0.6
0.8
1.0
Variable VddFixed Vdd
22
Aggressively lowering VDD + Vth
If VDD and Vth are dynamically scaled; the advantage is obvious
23
The future: sub-threshold and body bias ?
24
A fresh look at leakage currentsSome device and circuit level techniques for leakage current reduction are:
Dynamic threshold transistors (DTMOS) Technique permits the body voltage to be switched with the gate voltage.High threshold voltages in standby mode result in low leakage currents.Low threshold voltage in active mode allow for higher current drives (high speed).
Multi-threshold CMOS (MTCMOS)A high threshold voltage device is placed in series with low threshold MOS devicesDevices in the critical path are assigned low threshold voltages to allow for high gate speedsDevices that are not in the critical path are assigned high threshold voltages to dissipate minimum leakage power in standby mode.
Digital sub-threshold voltage Devices operate in sub-threshold region (Vgs < |Vth|)Technique is suitable for ultra low power applications where speed is of secondary importance
25
VoutVin
VDD
DTMOS Inveter configuration
Various DTMOS configurations
DTMOS:Allows for control of the bulk terminalGood for low voltage operation (VDD < 0.6V)
26
Low-VTH circuit(High leakage)
High-VTH circuit(Low leakage)
Critical paths
Non-critical paths
Basic MTCMOS architecture
27
MTCMOS circuits configuration
MTMOS:Low Vth in active modePower supply is disconnected through the high Vth device in standby modeExtra high Vth memory circuit needed if data retention is necessary in standby mode
Low Vt Devices or Logic
High Vt Device
Vsleep
VDD
V_HIGH
VDD
Low Vt Devices or Logic
High Vt Device
Vsleep
VGND
28
Digital sub-threshold circuitsImproved characteristics including higher gain, better noise margin, and more energy efficientRatio-ed logic (pseudo/true-NMOS) compared to CMOS logic in terms of switching and powerPseudo NMOS:
Switches faster Draws high currents (dc currents are dominant)Dissipates more powerBoth CMOS and pseudo-nMOS sub-threshold logic are easy to design and more efficient as compared to other known ultra-low power logic, such as energy-recovery logic
29
Brown et al have compared floating body and DTMOS inverters. Body conditioning is expected to yield superior resultsOur ring oscillators use both conventional and adaptive body biasing.
Ring oscillator configurations
30
Ring oscillators @ different nodes (PDP)
Wp Wn Delay Curren
t SPEE
DPOW
ER PDP EDPnm nm ns nA GAIN nW fJ fJ*ns
250 nm
VDD (mV
) 450
CMOS 3900150
0296.9
0 286 1.00 26 7.642.268
97Pseudo
nMOS 1250150
0183.0
0 480 1.62 43 7.901.446
72Pseudo +
Swap 1500150
0146.5
0 2800 2.03 25236.9
15.408
48
180 nm
VDD (mV
) 450
CMOS 3375108
0176.7
0 270 1.00 24 4.300.760
40Pseudo
nMOS 900108
0 75.50 688 2.34 62 4.670.353
16Pseudo +
Swap 1080108
0 62.40 3055 2.83 27517.1
51.070
58
130 nm
VDD (mV
) 300
CMOS 450 780 4.30 2600 1.00 156 0.670.002
88Pseudo
nMOS 450 780 2.50 5100 1.72 306 0.760.001
91Pseudo +
Swap 450 780 2.40 5450 1.79 327 0.780.001
88
31
32
The best of both worlds ?
33
Effect of using different circuits styles
34
How are logic design styles affected?
CL VDDVswin
gIsc*VDD IDC*VDD
Isc*e-vt/
vT*VDD
Standard CMOS
3CL
VDD VDD1.5
1X[0 if
VDD≤Vtn+Vtp]0 1
Domino CL VDD VDD 21X
[0 if VDD≤Vtn+Vtp]
0 1
Pass Transistor
CLVDD VDD-
Vt
0.4 0 0 1
Differential (standard)
2CL
VDD VDD 42X
[0 if VDD≤Vtn+Vtp]
0 2
Differential w/ charge recycling
2CL
VDDVDD/
2 22X
[0 if VDD≤Vtn+Vtp]
0 2
Pseudo nMOS CL VDD
VDD-Vt
0.4
1X[0 if VDD≤Vtn]
1X[0 if
VDD≤V]1
Pdynamic Pshort_circu
it
PDC PleakageLOGIC STYLE
35
The interconnection dilemma
“THE FAULT, DEAR BRUTUS, “THE FAULT, DEAR BRUTUS, LIES NOT IN LIES NOT IN OUR GATESOUR GATES, , BUT IN BUT IN OUR WIRESOUR WIRES.”.”
– with apologies to W. Shakespeare and J. Caesar
Instead of conclusions … Where is CL?
Silicon wafer
Metal 1
Metal 2
Metal 3
Metal 4
Metal 5
Metal 6
Metal 7
36