Adaptive Supply and Threshold Circuits and Applications Elad Alon, Kevin Nowka (IBM Research),...
-
date post
21-Dec-2015 -
Category
Documents
-
view
220 -
download
1
Transcript of Adaptive Supply and Threshold Circuits and Applications Elad Alon, Kevin Nowka (IBM Research),...
Adaptive Supply and Threshold Circuits and Applications
Elad Alon, Kevin Nowka (IBM Research), Vladimir Stojanović, Mark Horowitz
Why Adaptive Vdd/Vth?
• No one transistor meets all needs• This transistor is too leaky…• This transistor is too slow…
• Modern processes usually have lots of device options, but they still have a set of fixed characteristics• Optimum characteristics often environment dependent and
hence vary with time, workload, etc.
• May want to tune both Vdd and Vth on a block-by-block basis to minimize total energy
• Supply can be set/controlled with regulators - how about Vth?
Body Biasing
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.60
0.5
1
1.5
2
2.5
3
3.5
Normalized FrequencyN
orm
aliz
ed
Po
wer
Simulated Power vs. Frequency
RBB (-1V)RBB (-0.5V)ZBBFBB (0.25V)FBB (0.5V)
• Unfortunately, body bias is not very effective in modern technologies• Less than 100mV shift in Vth across full range of bias
• Hardly any effect on power vs. frequency (traced by sweeping Vdd)
• Not very promising…
• Usual approach for adjusting Vth: body bias
VddVbp
Vbn
VddVbp
Vbn
Adjusting Vth with Skewed Supplies
• Assume that delay (and power) is dominated by edges in a particular direction• We’ll come back to the other edges shortly
• “Vth” can be adjusted by skewing supplies of pos. edge gates (PMOS) vs. neg. edge gates (NMOS)• Notation: ΔVth>0 means device’s Vth reduced by ΔVth
Vdd
Vth
Vth
Vdd
Vss+Vth
Vdd+Vth
Vss
Vdd
Vss+Vth
Vdd+Vth
Power vs. Frequency Preview
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.60
0.5
1
1.5
2
2.5
3
3.5
Normalized Frequency
No
rmaliz
ed
Po
wer
Simulated Power vs. Frequency
RBB (-1V)RBB (-0.5V)ZBBFBB (0.25V)FBB (0.5V)
Ring Oscillator w/Body Bias Skewed Supply Ring Oscillator
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20
0.5
1
1.5
2
2.5
Normalized Frequency
No
rmal
ized
Po
wer
Vth
=-0.1VV
th=0V
Vth
=0.1VV
th=0.2V
Outline
• Skewed Supply Logic Circuits
• Adaptive Implementation
• Application to Minimum Energy Systems
• Conclusions
What About the Non-Critical Edge?
Vdd - Vth
pVss
pVdd
nVss
nVdd
pVss
pVdd
Vdd - Vth
• Performance benefit negated if need to wait for the other (slow) edge
pVss
pVdd
nVss
nVdd
pVss
pVdd
Vdd
Vth
Vdd
Vth
ΔVth > 0: ΔVth < 0:
• Skewed supply directly shifts Vth of non-critical devices in the opposite direction as the critical devices
• Need to return to default state so that leakage isn’t set by (reduced threshold) non-critical devices
Self-Resetting Skewed Supply Gates
• Can be extended to use delayed self-reset (i.e. interlock mechanism to guarantee input pulses overlap)• Another option: use self-resetting critical path replica to
generate en/en_b signals for every level of logic gates
• Keep gates in default state most of the time: self-reset
Self-Resetting Gates Challenges
• Pulse-width needs to (at least somewhat) track delay across Vdd and ΔVth
• Don’t want pulses to disappear• Don’t want reset, re-enable delay to become critical path
nVdd
nVss
N-Stack
P-Stack
nVdd
en
outn
• Maintain proper operation at high |ΔVth|
• ΔVth>0: P-stack in subthreshold, N-stack Vth≈0
• Gate could fire even when inputs unasserted
• ΔVth<0: subthreshold N-stack vs. Vth≈0 P-stack leakage
Self-Resetting N-Gate: Keeper
• Keeper structure helps boost voltage margin for ΔVth>0 (as opposed to a weakened P-stack connected to inputs)• Unfortunately, can’t really make keeper strength track because gate needs to reset
to nVdd.
• (Unless use yet another supply nnVss…)
• For ΔVth<0, need to make sure N-stack can always overpower the keeper
• May want to go back to P-stack connected to inputs as “keepers”
Self-Resetting N-Gate: Reset Path
• Pulse width tracks delay by alternating n/p supplies on reset path• Falling edge outn traverses “critical” edge through reset gates
• However, the re-enabling edge will then have the “non-critical” delay• For ΔVth>0, re-enable path will be slow – NOR gate cuts path in half
• For high ΔVth even this might not be enough – another option next
• For ΔVth<0, re-enable edge will be fast – need to make sure the gate fully resets (or at least turns on the keeper).
Self-Resetting N-Gate: Reset Path (#2)
• Even with NOR gate, At high ΔVth re-enable delay can be VERY slow• Devices in that direction can easily be in subthreshold
• Break the “rules” and have gates on re-enable path swing from nVss to pVdd
• Often costs less power than reducing the fanout
• Also allows evaluate device at bottom of N-stack to see ΔVth
Self-Resetting P-Gate
• Could be mirrored version of N-gate, but because of higher NMOS drive current (and sometime lower Vth) input “keeper” can still be relatively effective (even when ΔVth>0)
pVdd
P-Stack
nVdd
nVss
pVdd
pVss
pVdd
pVss
nVdd
nVss
outp
pVsspVss
N-Stack
Skewed Supply Oscillator• Use ring oscillator as a test structure to characterize
the gates • Helps find issues that arise at various operating points
• Pulse “chases its own tail”:
Operating Range and Power vs. Frequency
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20
0.5
1
1.5
2
2.5
Normalized Frequency
No
rmaliz
ed
Po
wer
Vth
=-0.1VV
th=0V
Vth
=0.1VV
th=0.2V
Vdd = 1V
Vdd = 600mV
Vdd = 500mV
• Simulation results from a 90nm triple well technology• Since skewing supplies for threshold adjustment triple well
isn’t a requirement
• Gates designed for ΔVth>0 operation
• Without heavy optimization covers ~300mV range in ΔVth
Outline
• Skewed Supply Logic Circuits
• Adaptive Implementation
• Application to Minimum Energy Systems
• Conclusions
Adaptive System Block Diagram
• Roles/bandwidths of Vdd/Vth loops can be flipped
• Just want separated bandwidths to minimize stability issues
• More complicated algorithms can do both at same speed, but probably not needed (environment changes usually slow)
Frequency-LockedLoop
Power MinimizationLoop
Logic Block
flogic
Critical PathReplica Oscillator
fref
Plogic
Vdd
Vth
• “High” bandwidth FLL enforces frequency constraint by setting Vdd
• “Low” bandwidth threshold loop attempts to minimize power through Vth
Generating the Power Supplies:Switching DC-DC Converters
• Switching DC-DC converters desirable for efficiency• But hardest to integrate• Want external inductors or efficiency may suffer
• Power measurement (for Vth loop) can be tricky• Could use extra series resistor, but again costs efficiency
• May get that resistance from an on-chip inductor anyways
+-
Vsup
Vdd
Vss
Generating the Power Supplies:On-Chip Linear Regulators
• Efficiency could greatly suffer however• Especially if get only one Vsup to generate both n and p supplies
+
-
+
-
Vref_dd
Vref_ss
Vsup_dd
Vsup_ss
Vdd
Vss
• On-chip linear regulators most desirable for integration• High bandwidths easy to achieve
• Easy to measure power• External supply fixed, just mirror output
device current
Generating the Power Supplies:Hybrid Architecture
• “Best of both worlds”• High bandwidth, easy to integrate on-chip linear regulators• Adjust external switching regulators to just meet linear
regulators’ dropout (and minimize loss)
• To minimize external component count could share external supplies across multiple blocks • Of course at some cost in efficiency however
SwitchingRegulators
LinearRegulators
Vsup_dd
Vsup_ss
Vdd
Vss
Vdropout
+
-
Vref_dd Vref_ss
Vsup
+-
FLL Implementation (1)
• Charge-pump based design• Pulse generators + charge pump = analog counter
• nVss serves as global reference (i.e. chip Vss or “0”)• Control loops generate the other three rails
Pulse Generator
up_b
dn_b
nVss
nVss
Vsup_dd
1/N
fref
pVss
Critical PathOscillator
pVdd
nVdd
nVss
Vc_sup
flogic
ffb
pVdd & nVddRegulators
pVdd & nVddRegulators
Vc_thresh (from power loop)
FLL Implementation (2):Regulators
• For simplicity used on-chip linear regulators
• Vsup_dd, Vsup_ss – external supplies w/headroom for regulators• Vsup_dd ≈ Vdd_max+|ΔVth|max+150mV
• Vsup_ss ≈ -150mV
• Vc_sup sets pVdd, pVss set by power loop• Shifted ground on nVdd regulator
feedback makes nVdd = pVdd – pVss
+
-
nVss
+
-
pVss
pVdd
Vsup_dd
Vsup_dd
nVdd
+
-Vc_thresh
Vsup_ss
Vc_sup Vgp_d
Vgn_d
Vgp_s
Power Minimization Algorithm
• Optimization problem: • min{Vdd,Vth} Pavg(Vdd,Vth)
s.t. f = ftarg
• FLL enforces constraint and eliminates Vdd as a variable
• Set by ΔVth and operating frequency
• Simplified minimization algorithm:• Step 1: Increase ΔVth by 1 step; measure average power
• Step 2: Decrease ΔVth by 1 step; measure average power
• Step 3: Move in direction of lower average power, repeat Step 1
• Works as long as P vs. Vth curve has no locally flat regions (except global minimum)
• Hard to show analytically, but intuitively (and numerically) true
0.05 0.1 0.15 0.2 0.25 0.3
1
1.2
1.4
1.6
1.8
2
2.2
2.4
2.6
Vth
No
rmal
ized
En
erg
y
Power Loop Implementation:Measuring Power
• Mirror regulator current to measure block’s current• Voltage fixed, so just add currents from pVdd and nVdd to find total
power (current)
• Want more processing if external supply is not fixed• Multiply Itot by Vsup_ext
• If Vsup_ext is digitally controlled multiplication could be done in current domain by programming output mirroring ratio M
Vsup_dd
Vgn_d
Vsup_dd
Vgp_d
IpVdd InVdd
Itot
M
Imirr
1Vmirr
Power Loop Implementation:Minimization Algorithm (I)
• Step 1: Pulse upΔ (+ΔVc), enable dnint (integrate –Imirr)
• Step 2: Pulse dnΔ (–ΔVc), enable upint (integrate +Imirr)
• Step 3 happens automatically since:• Vc_th[k+1] > Vc_th[k] if Imirr(+ΔVc)<Imirr(-ΔVc)
• Vc_th[k+1] < Vc_th[k] if Imirr(+ΔVc)>Imirr(-ΔVc)
Power Loop Implementation:Minimization Algorithm (II)
• To keep polarities correct need IΔtΔ > Imirrtint
• May need small pump currents and/or large capacitors, especially if shooting for small ΔVc
Outline
• Skewed Supply Logic Circuits
• Adaptive Implementation
• Application to Minimum Energy Systems
• Conclusions
Minimum Energy Systems with Global Supply
• Supply set by global activity vs. leakage energy ratio• But blocks may exhibit wide variances in their activities• Even a single block’s activity may vary with time (e.g. static vs.
dynamic MPEG frame)
Adder Afglob
Memory
Vdd_glob Vdd_glob Vdd_globVdd_glob
Adder Bfglob
Multiplier Afglob
Multiplier Bfglob
Minimum Energy Systems With Adaptive Supplies
• In subthreshold, minimum energy is independent of Vth
• Vth increases: both frequency and leakage decrease, net energy stays the same
• Can get minimum energy by adjusting each Vdd, but:• Each block would have to operate at its own frequency…
Adder AfaddA
Memory
Vdd_addA Vdd_addB Vdd_multAVdd_multB
Adder BfaddB
Multiplier AfmultA
Multiplier BfmultB
Minimum Energy Systems with Adaptive Supplies and Thresholds
• Controlling both Vdd and Vth allows blocks to achieve minimum energy at arbitrary operating frequency• All blocks can then operate at the same (system determined)
frequency• Much simpler system to design and interface with than only
adaptive supply…
Adder AVth_addA, fglob
Memory
Vdd_addA Vdd_addB Vdd_multAVdd_multB
Adder BVth_addB, fglob
Multiplier AVth_multA, fglob
Multiplier BVth_multB, fglob
Outline
• Skewed Supply Logic Circuits
• Adaptive Implementation
• Application to Minimum Energy Systems
• Conclusions
Conclusions• Skewed supplies a promising approach to allow direct
control/optimization of effective device thresholds• Still lots of issues to work out of course more research to be
done
• For low-power applications, combined adaptation of Vdd and Vth can achieve per-block minimum energy while maintaining global synchronicity• No need for software directives; chip constantly adapts itself to
keep energy dissipation as low as possible
• This technique is attractive in high-performance applications as well• Improvements in power efficiency increased performance in a
heat-dissipation limited environment
Bonus Slides
Digital Control Implementation
• Particularly in advanced technologies, can be difficult to get charge pumps to behave as desired• Both FLL and power loop well suited to digital control
implementations
• FLL:• Frequency detect is really easy – just count• DAC just needs to enough resolution to keep dither small
• Power loop:• Power ADC: Use mirrored block current as supply for current-
starved ring, count• Really need (effectively) monotonic DAC however