Methods for True Power Minimizationbwrcs.eecs.berkeley.edu/.../Readings/1B_3_stojanovic.pdfMethods...
Transcript of Methods for True Power Minimizationbwrcs.eecs.berkeley.edu/.../Readings/1B_3_stojanovic.pdfMethods...
-
Methods for True Power MinimizationMethods for True Power Minimization
Robert W. Brodersen1, Mark A. Horowitz2, Dejan Markovic1, Borivoje Nikolic1 and Vladimir Stojanovic2
1Berkeley Wireless Research Center, UC Berkeley 2Stanford University
-
Power limited operation
Delay
Unoptimized design
Emax
DmaxDmin
Energy/op
Emin
Achieve the highest performance under the power cap
-
Power limited operation
Delay
Unoptimized design
Var1
Energy/op
Designoptimizationcurves
Emax
DmaxDmin
Emin
Achieve the highest performance under the power cap
-
Power limited operation
Delay
Unoptimized design
Var1
Var2
Energy/op
Designoptimizationcurves
Emax
DmaxDmin
Emin
Achieve the highest performance under the power cap
-
Power limited operation
Delay
Unoptimized design
Var1
Var2
Var1 + Var2
Energy/op
Designoptimizationcurves
Emax
DmaxDmin
Emin
How far away are we from the optimal solution?
-
Power limited operation
Delay
Unoptimized design
Var1
Var2
Var1 + Var2
Global
Energy/op
Designoptimizationcurves
Emax
DmaxDmin
Emin
Global optimum – best performance
-
Power limited operation
Delay
Unoptimized design
Emax
DmaxDmin
Energy/op
Emin
Maximize throughput for given energy orMinimize energy for given throughput
-
Design optimization♦ There are many sets of parameters to adjust♦ Tuning variables
Circuit(sizing, supply, threshold)
Logic style(domino, pass-gate, )
Block topology (adder: CLA, CSA, )
Micro-architecture (parallel, pipelined)
topology A
topology BDelay
Ener
gy/o
p
-
Design optimization♦ There are many sets of parameters to adjust♦ Tuning variables
Circuit(sizing, supply, threshold)
Logic style(domino, pass-gate, )
Block topology (adder: CLA, CSA, )
Micro-architecture (parallel, pipelined)
Globally optimal boundary curve: pieces of E-D curves for different topologies
topology A
topology BDelay
Ener
gy/o
p
-
Outline
♦ Circuit optimization♦ Joint optimization♦ Select the most promising sets of tuning variables
♦ Circuit & µArchitecture examples♦ Adder ♦ Add-Compare
♦ Conclusions
-
Energy-delay sensitivity
( )*dd dd
dddd
dd V V
EV
Sens VD
V=
∂∂
= −∂
∂
♦ Proposed by Zyban at ISLPED02
♦ ∆E = Sens(A)(-∆D) + Sens(B)∆D
At the optimal point,all sensitivities should be the same
-
Alpha-power based delay model
( ) dpard dd out
pin indd on
WK V WtW WV V α
⋅= ⋅ +
−
♦ Fitting parametersVon, αd, Kd
-
Alpha-power based delay model
( ) dpard dd out
pin indd on
WK V WtW WV V α
⋅= ⋅ +
− heff
♦ Fitting parametersVon, αd, Kd
♦ Effective fanout, heff
-
Energy model
♦ Switching energy
( ) 20 1 ( ) ( )Sw out par ddE C W C W Vα →= ⋅ + ⋅
♦ Leakage energy( )
00 ( )
th ddV VV
Lk in in ddE W I S e V Dγ− −
= ⋅ ⋅ ⋅ ⋅
-
Sensitivity to sizing and supply♦ Gate sizing (Wi)
( ), , 1
Swi i
nom eff i eff ii
EW ec
D h hW τ −
∂∂
− =∂ ⋅ −∂
∞ for equal heff (Dmin)
♦ Supply voltage (Vdd)
12
1
Swdd Sw v
d vdd
EV E x
D D xV α
∂∂ −
− = ⋅∂ − +
∂ Vth Vdd
Vddnom
Sens(Vdd)
0
xv = (Von+∆Vth)/Vdd
-
Sensitivity to Vth
♦ Threshold voltage (Vth)
0
( )1
( )
th dd on thLk
dth
EV V V V
PD VV α
∂ ∂ ∆ − − ∆
− = ⋅ − ∂ ⋅ ∂ ∆
0 Vth
Vthnom
Sens(Vth)
Low initial leakage ⇒ speedup comes for “free”
-
Optimization setup
♦ Reference/nominal circuit sized for Dmin @ Vddnom, Vthnom known average activity
♦ Set delay constraint
♦ Minimize energy under delay constraint gate sizing Vdd , Vth scaling
-
Kogge-Stone tree adder topology
S0
S15
(A0, B0)
(A15, B15)
♦ Off-path load (gates + wires)♦ Reconvergence (inside ●-block)
-
Tree adder: Sizing optimization♦ Reference: all paths are critical
Nominal (Dmin, Enom)
Sizing opt. (1.1Dmin, 0.3Enom)
Internal energy peaks ⇒Big savings for small delay penalty with resizing
-
Joint optimization: sizing and Vdd
Nominal design
minnomD
( , )nomddf W V( , )newddf W V
( )ddf VEn
ergy
/op
Delay
∆E = Sens(Vdd)•(-∆D) + Sens(W)•∆D
-
Results of joint optimizationEn
ergy
[Eno
m]
Delay [Dnom]
Nominal Design(Dnom, Enom)
Energy efficient curve f(W,Vdd,Vth)
(Dnom, Emin)
80% of energy saved without delay penalty
1 (reference)(Dnom, Emin)1.5Vdd
0.2∞(Dnom, Enom)VthWSens
Sensitivity table
-
Results of joint optimizationEn
ergy
[Eno
m]
Delay [Dnom]
Nominal Design(Dnom, Enom)
Energy efficient curve f(W,Vdd,Vth)
(Dnom, Emin)
(Dmin,Enom)
80% of energy saved without delay penalty40% speedup for same energy
22161 (reference)(Dnom, Emin)22(Dmin, Enom)
1.5Vdd
0.2∞(Dnom, Enom)VthWSens
Sensitivity table
-
A look at tuning variablesSupply Threshold
reliabilitylimit
Sens(Vdd)=Sens(W)=Sens(Vth)=1
Limited range of tuning variables
-
A look at tuning variablesSupply Threshold
reliabilitylimit
Sens(Vdd)=16Sens(Vth)=Sens(W)=22
Sens(Vdd)=Sens(W)=Sens(Vth)=1
Limited range of tuning variables
-
Reducing the number of dimensionsEn
ergy
[Eno
m]
Delay [Dnom]
Threshold and sizing nearly optimal around the nominal point
-
Scope of circuit optimizationEn
ergy
[Eno
m]
Delay [Dnom]
Effective region +/-30% around nominal delay
-
Circuit & µArchitecture optimization♦ Revisit the old argument for parallelism
A B A B
A BA B
f f2f
2f
f
f f f
(a) reference
(c) pipeline(b) parallel
♦ What happens if we can choose optimal Vdd and Vth for each design?
-
Balance of leakage and switching energy
2
ln
Lk
Sw Opt dtech
avg
EE L K
α
=
−
Optimal designs have high leakage current
-
Conclusions♦ All design levels need to be optimized jointly
♦ Equal marginal costs ⇒ Energy-efficient design
♦ Peak performance is VERY power inefficient
♦ Todays designs are not leaky enough to be truly power-optimal
♦ Pipelining starts to gain advantage over parallelism