Hasler IIT Lecture2 2009a
-
Upload
anjireddy-thatiparthy -
Category
Documents
-
view
245 -
download
0
Transcript of Hasler IIT Lecture2 2009a
-
8/10/2019 Hasler IIT Lecture2 2009a
1/30
-
8/10/2019 Hasler IIT Lecture2 2009a
2/30
Power Efficient Computing
Cortical Neurons 1000s of inputs,
1000s of channel populations,
one output
Equivalent computation ~
400MMAC / neuron(no learning / growth)
~ roughly 20pW / neuron
~ 500TMAC
< 10000 neurons
~100kW (comp) with 4000 DSPs
400MMAC / neuron at 20pW
digital is quite far away (100mW)analog VMM closer (100W)
analog HMM / dendrites get close
Useful Analog must be
Programmable / Configurable
Custom Analog ~ 1000 10000
more efficient than Custom Digital
(Mead 1990)
Portable Devices battery powered(or less)
larger systems
minimize battery size / weight
Get as much computation
as possible
Analog (VMM): 10MMAC/ W Digital: 4 MMAC / mW (DSP)
-
8/10/2019 Hasler IIT Lecture2 2009a
3/30
History of Digital System Design
VLSI taught
In CMOS
2000
FPGAs
In classes
Mead &
Conway
First
Synthesis
classes
1970
1980 1990
1960
4004 Intel
First IC
First CAD
(Fairchild, 1967)
TMS 32010(NMOS)
Magic (CAD ventures) Synthesis tools
First VLSI
courses
Speak and Spell
(first DSP?)
TI C54
(fixed
point)
MOSIS
XC2064
Pentium (Intel)
(0.8um)
MIPS
Handcrafted Design
Every Gate Optimized
Cost only feasible for
government contracts
A separation of design from technology
(build framework for abstraction)
technologists know, how to fabricate
smaller and faster transistors
designers know how to coordinate
millions of transistors
-
8/10/2019 Hasler IIT Lecture2 2009a
4/30
Reconfigurable Signal Processing
Innovation and Process Scaling moves
solutions towards programmability
and reconfigurability
Cos
t
Cos
t
100% S/W
(Programmable)
100% H/W
(Fixed Function)
Tech
trend
Obtaining data for 4MMAC computation ~ 4mW
DSPs Low Power Processing
- cell phones(processing < 30mW average)
- hearing aids (1 mW levels)
(AMI / DSP factory)
Power: 54C series 4MMAC/mW
FPGAs Large Configurability
Power: Just MAC engine
around 2-10MMAC/mW
Baseline static power ~ 0.5W to 1 W
Signal routing power / memory: ?
Power does not include comm off chip
(i.e. accessing memory)
Power = !C Vdd2f for CMOS
Chip to Chip (10pF load min, 2.5V):
32uW/Mbit (dynamic)
-
8/10/2019 Hasler IIT Lecture2 2009a
5/30
Modern System DesignDesign at
Multipliers and Adders
When building analog systems,
we expect to build primitives at the basic algorithm level....
Analog = programmable and configurable.
How to get enough analog engineers
Design at
gate level Design at Basic Algorithms
Vector-Matrix Multiplication
Frequency Decomposition
Adaptive FiltersClassifiers (NN, GMM, HMM)
Hierarchy is a key ingredient to the
success of the digital circuit, and, until
recently, one reason why large analogdesigns have been difficult
(1837)
Fixed function Digital
Fixed function Analog
Programmable
Digital (Mixed mode)
-
8/10/2019 Hasler IIT Lecture2 2009a
6/30
Levels of Energy Efficiency
Subthreshold
Transistor Operation Programmable Circuits
(FG transistors) Analog Signal Processing Configurable Signal
ProcessingHighest throughput /
amount of power
Eliminate mismatch
Programmability
Wide accessibility
~ x1000 improvement
in power efficiency
Moving analog approaches /conceptual framework to a system design approach,
similar to digitals system transformation in the 1970s / 80s.
-
8/10/2019 Hasler IIT Lecture2 2009a
7/30
Measured Channel Current
-
8/10/2019 Hasler IIT Lecture2 2009a
8/30
MOSFET Current-Voltage Curves
If,r= 2 Ithln2(1 + e )
!!!"#$"%' ( ")*+( ""+*)',-.%
Ith= CoxUT2(W/L) / !
EKV Model
Subthreshold
DIBL / VA
Above-threshold
I = 2 Ith(e - e )!!
!"#$"%' $ ")$"
"+',.% !!
!"#$"%' $ "+$"
")',.%
If,= ( Cox/!) (W/L) / ( )2!!!"#$"%' $ ")( ""+'
If= 2 Ith e!!!"#$"%' $ ")$ ""+',.% (Saturation, Vds> 4UT)
(Saturation, IR~0, Vds> Von)
I = #f
#r
-
8/10/2019 Hasler IIT Lecture2 2009a
9/30
Classic Multilevel EEPROMs
Vtun
Tunneling
Junction
First reported EEPROM element in standard CMOS
(Thomson and Brooke, 1989)
ETANN: Floating-Gate element
used for biasing (Holler, et.al, 1989)
EEPROM Process, bidirectional tunneling
GND GND
V2V1
ISD voice recorder ICs
(answering machine messages, greeting cards, etc.)
Many standard IC processes allow for
EEPROM devices (standard cells, standard process)
Most commercial EEPROMs are multibit
-
8/10/2019 Hasler IIT Lecture2 2009a
10/30
Programmable Analog Transistors
Otherwise, need a DAC at every parameter and/or memory, etc.
"#$%&'%(' )*+#
",%$% (-$-&./&0
1 23 456278 495 :-%(;
-
8/10/2019 Hasler IIT Lecture2 2009a
11/30
Electron Transport in a subthreshold nFET
-
8/10/2019 Hasler IIT Lecture2 2009a
12/30
Measurements and Modeling of
Hot-Electron Injection
-
8/10/2019 Hasler IIT Lecture2 2009a
13/30
Impact Ionization
UO- 7-%& (%$- /V %& B7?%D$MB/&BW%./&
D/CCB@B/& B@ OBGOC: -&-(G: '-?-&'%&$
X7?%D$ )H((-&$ B@ ?(/?/(./&%C
$/ @/H(D- DH((-&$
-
8/10/2019 Hasler IIT Lecture2 2009a
14/30
pFET Hot-Electron Injection
UO- B&Q-D$-' -C-D$(/&@ %(- G-&-(%$-'
K: O/C- B7?%D$ B/&BW%./&@6
X&Q-D./& DH((-&$ B@ ?(/?/(./&%C $/
@/H(D- DH((-&$; %&' B@ %&
-Y?/&-&.%C VH&D./& /V !'D
6
3B&QZ [
-
8/10/2019 Hasler IIT Lecture2 2009a
15/30
Injection Above and Below VTv
source
drai
channel
1
2
34
pFET injection, Above VT, Ohmic
pFET injection, Sub VT, Saturation
-
8/10/2019 Hasler IIT Lecture2 2009a
16/30
Floating-Gate Devices as Circuit elements
Analog Signal processing at EEPROM densities
NIPS 1994
Vdd
Vtun
Vd
Vg
Neuron MOS ("MOS)(Shibata and Ohmi, 1992)
GND
Vdda3
a2
a1
a0
8C
4C
2C
C
Vout
GND
Vdd
a3
a2
a1
a0
8C
4C
2C
C
Vout
4-bit DAC (no sampling)
GND
Gate1
Gate2 GND
Iout
Gate1
Gate2
Iout
-
8/10/2019 Hasler IIT Lecture2 2009a
17/30
In+
In-
Itail
S1 S2D1 D2
Vg Vg
Vtun VtunVdd Vdd
VoutBias
CircuitryM1 M2
M3 M4
M5 M6
Floating-gate transistors
M8
M7
M10
M9
VAVB
Input Offset Voltage Drifts
by 130V over 170C
Measured Offset Voltage Drift vs. Temperature
Input Offset
Voltage
Reduced to
25V
Prog. Analog ICs Industrial Respect
V. Srinivasan, G. Serrano,
J. Gray, and P. Hasler,
CICC 2005, pp. 739-742.
(Best paper CICC 2005)
Gm-C filters, C4 Filters, ADCs, DACs, V regulators
-
8/10/2019 Hasler IIT Lecture2 2009a
18/30
Floating-Gate Voltage Output DAC
Process/ Vdd 0.5um CMOS / 5V
Linearity 10bit (INL/DNL)
Epot Accuracy < 100uV (measured)
< 1uV (theoretical)
Sample Rate ~10MSPS(instrumented)
>100MSPS (on-chip)
Input caps 140fF
-
8/10/2019 Hasler IIT Lecture2 2009a
19/30
-
8/10/2019 Hasler IIT Lecture2 2009a
20/30
Analog--Digital Signal ProcessingCADSP = Cooperative AnalogDigital
Signal Processing
Digital and Analog SP Efficiency
Custom Analog ~ 1000 - 10000 more
efficient than Custom Digital (Mead 1990)
Analog (VMM): 10MMAC/ W
( = 10TMAC / W)
Digital: 4 MMAC / mW (DSP)
A/DConverter
Real
world(analog)
DSPProcessor
Computer(digital)
Real
world
(analog)
DSP
Processor
Computer
(digital)
ASP
ICA/D
Specialized A/D
Computation MMAC/W Ratio to digital
LowPowerDSPs 0.02 to 0.002 1
Analog VMM 1 to 30 1000
Analog Filterbanks 30 to 1000 10000
Analog VQ 1 to 10 300
Analog HMM >1000 > 100000
Cepstrum
VQ
HMM
Microphon
e
DigitalSignal
Processing
-
8/10/2019 Hasler IIT Lecture2 2009a
21/30
-
8/10/2019 Hasler IIT Lecture2 2009a
22/30
FPAAs are Gaining Momentum
Concept Simulation VLSI Fabrication Testing
(3 months)
x 3
ConceptSimulation/
Synthesis Testing VLSI Fabrication
x 20
Large-Scale Field
Programmable Analog
Arrays (FPAA)
Approach Built on FloatingGate Circuits
RASP 1.x (2002)(T. Hall, P. Hasler, et. al, FPL, Sept. 2002. )
RASP 2.x:
RASP 2.5, 2.7: 2004-2007- >50,000 Prog. Analog Devices
- Used by > 100 Eng
RASP 2.8x: 2008-
- Used by > 100 Eng
RASP 2.9x: 2009-
Jan 2008
Can be a prototyping tool,
early devices,
or final application
-
8/10/2019 Hasler IIT Lecture2 2009a
23/30
RASP Programming/ConfigurationgV
Vd
ProgramRun (Program)
GND
Vout
GND
GND
GND
GND
GND
GND
Vin
Vdd GND
GND
GND
Vdd GND
GND
C
Vin
Vout
GND
A B C D E F G H
VMM VMM VMM VMM VMM VMM VMM VMM
0 0 0 0 0 0 0 0
252 216 180 144 108 72 36 0
GP GP GP GP GP GP GP GP
56 56 56 56 56 56 56 56
252 216 180 144 108 72 36 0
GP GP GP GP GP GP GP GP98 98 98 98 98 98 98 98
252 216 180 144 108 72 36 0
GP GP GP GP GP GP GP GP
140 140 140 140 140 140 140 140
252 216 180 144 108 72 36 0
GP GP GP GP GP GP GP GP
182 182 182 182 182 182 182 182
252 216 180 144 108 72 36 0
GP GP GP GP GP GP GP GP
224 224 224 224 224 224 224 224
252 216 180 144 108 72 36 0
VMM VMM VMM VMM VMM VMM VMM VMM
266 266 266 266 266 266 266 266
252 216 180 144 108 72 36 0
5
6
7
1
2
3
4
-
8/10/2019 Hasler IIT Lecture2 2009a
24/30
RASP 2.8 / 2.9 Series of FPAA devices
0.35um CMOS
Size ~ 3mm x 3mm
I/O pins ~ 56 (100 pin package)
2.8a: General FPAA
2.8b: BioChannel FPAA 2.8c: Sensor FPAA 2.8d: MITE FPAA
a low-power FPGA
\@-' K: R955 ]&G6
Switches are not dead weight
On-chip Programming 120 dB DR TIA
9 bit ramp ADC 7 bit DAC
RASP 2.9 IC family
Family of nine FPAA ICs
Generic FPAA Block
FPAA with Channel CABs
FPAA with Channel CABs+ Adaptive Synapses
FPAAs with Adaptive blocks
Larger Devices: 5mm x 5mm (x3)
100CABs;
potentially 1TMAC from one chipBetter Reticle Design: more # of devices
Custom versus FPGAs:
x2-3 speed, x10 area, x100 power
Custom versus FPAAs:
< x2 speed, < x2 area, < x2 power
RASP 2.8 IC family
-
8/10/2019 Hasler IIT Lecture2 2009a
25/30
Looking Closer at
CAB Components
nFET Transistors
pFET Floating-Gate Transistors
Transmission Gate
Floating Capacitors (2 terminals)
Basic 9-Transistor OTA
FG input 9-Transistor OTA
FG input 9-Transistor, Buffer Connected OTA
-
8/10/2019 Hasler IIT Lecture2 2009a
26/30
Other RASP 2.8 Architectures
RASP 2.8 architecture with transistor channel /synapses as CAB elements
RASP 2.8c: Sensor enabled FPAA
RASP 2.8d : MITE Enabled FPAA
RASP 2.8 architecture with MITE CAB
design and current mode support circuitry
RASP 2.8 architecture with additional CABsfor Universal Sensor Circuits
RASP 2.8b: Bio enabled FPAA
Inspired from FPNA work [Farquhar, et. al, 2006)
-
8/10/2019 Hasler IIT Lecture2 2009a
27/30
-
8/10/2019 Hasler IIT Lecture2 2009a
28/30
Building Bridges between
Algorithms and Hardware
Building Infrastructure:
Testing / Demonstration Boards
& teaching how to build
- Wide use of FPAA test platform
- Smaller Board development /
dedicated Programming boards
- FPAA chip specific adaptor boards(single and multiple chip platforms)
Software Infrastructure / Tools
First automated simulink to
system measurement test, Dec 2008
Some next directions
Targeting to SPICESimulink design tools
- More simulation models
- Noise, SNR, Distortion
Developed
visual tool for
routing (RAT)
Extensive Library
(working circuits)
Parameter
Translation
Starting design
at high level
-
8/10/2019 Hasler IIT Lecture2 2009a
29/30
Rapid Prototyping using FPAAsRASP 2.7 PhotoReceptor Response
1 2 3
Paper Strip
1
2
3
-
8/10/2019 Hasler IIT Lecture2 2009a
30/30
Levels of Energy Efficiency
Subthreshold
Transistor Operation Programmable Circuits
(FG transistors) Analog Signal Processing Configurable Signal
ProcessingHighest throughput /
amount of power
Eliminate mismatch
Programmability
Wide accessibility
~ x1000 improvement
in power efficiency
These techniques open further opportunities to utilize / explore
biologically inspired techniques
Large need for tools to compile / program these systems.
Link most useful at system /sig processing level
Education / training / foundational theory is critical for designing.
Moving analog approaches /conceptual framework to a system design approach,
similar to digitals system transformation in the 1970s / 80s.