ELEC 7770 Advanced VLSI Design Spring 2007 Power Aware Microprocessors
description
Transcript of ELEC 7770 Advanced VLSI Design Spring 2007 Power Aware Microprocessors
Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 11
ELEC 7770ELEC 7770Advanced VLSI DesignAdvanced VLSI Design
Spring 2007Spring 2007Power Aware MicroprocessorsPower Aware Microprocessors
Vishwani D. AgrawalVishwani D. AgrawalJames J. Danaher ProfessorJames J. Danaher Professor
ECE Department, Auburn UniversityECE Department, Auburn University
Auburn, AL 36849Auburn, AL 36849
[email protected]@eng.auburn.edu
http://www.eng.auburn.edu/~vagrawal/COURSE/E7770_Spr07http://www.eng.auburn.edu/~vagrawal/COURSE/E7770_Spr07
Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 22
SIA Roadmap for Processors (1999)SIA Roadmap for Processors (1999)YearYear 19991999 20022002 20052005 20082008 20112011 20142014
Feature size (nm)Feature size (nm) 180180 130130 100100 7070 5050 3535
Logic transistors/cmLogic transistors/cm22 6.2M6.2M 18M18M 39M39M 84M84M 180M180M 390M390M
Clock (GHz)Clock (GHz) 1.251.25 2.12.1 3.53.5 6.06.0 10.010.0 16.916.9
Chip size (mmChip size (mm22)) 340340 430430 520520 620620 750750 900900
Power supply (V)Power supply (V) 1.81.8 1.51.5 1.21.2 0.90.9 0.60.6 0.50.5
High-perf. Power (W)High-perf. Power (W) 9090 130130 160160 170170 175175 183183
Source: http://www.semichips.org
Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 33
Power Reduction in ProcessorsPower Reduction in Processors
Just about everything is used.Just about everything is used. Hardware methods:Hardware methods:
Voltage reduction for dynamic powerVoltage reduction for dynamic power Dual-threshold devices for leakage reductionDual-threshold devices for leakage reduction Clock gating, frequency reductionClock gating, frequency reduction Sleep modeSleep mode
Architecture:Architecture: Instruction setInstruction set hardware organizationhardware organization
Software methodsSoftware methods
Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 44
SPEC CPU2000 BenchmarksSPEC CPU2000 Benchmarks Twelve integer and 14 floating point programs, Twelve integer and 14 floating point programs,
CINT2000CINT2000 and and CFP2000CFP2000.. Each program run time is normalized to obtain a Each program run time is normalized to obtain a
SPEC ratioSPEC ratio with respect to the run time of with respect to the run time of Sun Sun Ultra 5_10 with a 300MHz processorUltra 5_10 with a 300MHz processor..
CINT2000CINT2000 and and CFP2000CFP2000 summary summary measurements are the geometric means of measurements are the geometric means of SPEC ratios.SPEC ratios.
Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 55
Reference CPU s: Sun Ultra 5_10 Reference CPU s: Sun Ultra 5_10 300MHz Processor300MHz Processor
0
500
1000
1500
2000
2500
3000
3500g
zip
vp
rg
cc
mc
fc
raft
yp
ars
er
eo
np
erl
bm
kg
ap
vo
rte
xb
zip
2tw
olf
wu
pw
ise
sw
imm
gri
da
pp
lum
es
ag
alg
el
art
eq
ua
ke
fac
ere
ca
mm
plu
ca
sfm
a3
ds
ixtr
ac
ka
ps
i
CINT2000
CFP2000
Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 66
CINT2000: 3.4GHz Pentium 4, HT CINT2000: 3.4GHz Pentium 4, HT Technology (D850MD Motherboard)Technology (D850MD Motherboard)
0
500
1000
1500
2000
2500g
zip
vpr
gcc
mcf
craf
ty
par
ser
eon
per
lbm
k
gap
vort
ex
bzi
p2
two
lf
Base ratio
Opt. ratio
SPECint2000_base = 1341SPECint2000 = 1389
Source: www.spec.org
Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 77
Two Benchmark ResultsTwo Benchmark Results
Baseline: A uniform configuration not optimized Baseline: A uniform configuration not optimized for specific program:for specific program:
Same compiler with same settings and flags used Same compiler with same settings and flags used for all benchmarksfor all benchmarks
Other restrictionsOther restrictions
Peak: Run is optimized for obtaining the peak Peak: Run is optimized for obtaining the peak performance for each benchmark program.performance for each benchmark program.
Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 88
CFP2000: 3.6GHz Pentium 4, HT Technology CFP2000: 3.6GHz Pentium 4, HT Technology (D925XCV/AA-400 Motherboard)(D925XCV/AA-400 Motherboard)
0
500
1000
1500
2000
2500
3000w
up
wis
esw
im
mg
rid
app
lum
esa
gal
gel art
equ
ake
face
rec
amm
plu
cas
fma3
dsi
xtra
ck
apsi
Base ratio
Opt. ratio
SPECfp2000_base = 1627SPECfp2000 = 1630
Source: www.spec.org
Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 99
CINT2000: 1.7GHz Pentium 4CINT2000: 1.7GHz Pentium 4(D850MD Motherboard)(D850MD Motherboard)
0100200300400500600700800900
1000g
zip
vpr
gcc
mcf
craf
ty
par
ser
eon
per
lbm
k
gap
vort
ex
bzi
p2
two
lf
Base ratio
Opt. ratio
SPECint2000_base = 579SPECint2000 = 588
Source: www.spec.org
Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1010
CFP2000: 1.7GHz Pentium 4 (D850MD CFP2000: 1.7GHz Pentium 4 (D850MD Motherboard)Motherboard)
0
200
400
600
800
1000
1200
1400w
up
wis
esw
im
mg
rid
app
lum
esa
gal
gel art
equ
ake
face
rec
amm
plu
cas
fma3
dsi
xtra
ck
apsi
Base ratio
Opt. ratio
SPECfp2000_base = 648SPECfp2000 = 659
Source: www.spec.org
Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1111
Energy SPEC BenchmarksEnergy SPEC Benchmarks
Energy efficiency mode: Besides the execution Energy efficiency mode: Besides the execution time, energy efficiency of SPEC benchmark time, energy efficiency of SPEC benchmark programs is also measured. Energy efficiency of programs is also measured. Energy efficiency of a benchmark program is given by:a benchmark program is given by:
1/(Execution time)1/(Execution time)Energy efficiency Energy efficiency == ────────────────────────
joules consumedjoules consumed
Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1212
Energy EfficiencyEnergy Efficiency
Efficiency averaged on Efficiency averaged on nn benchmark programs: benchmark programs:
nn
EfficiencyEfficiency == (( ΠΠ Efficiency Efficiencyii ))1/1/nn
ii=1=1
where Efficiencywhere Efficiencyii is the efficiency for program is the efficiency for program ii..
Relative efficiency:Relative efficiency:
Efficiency of a computerEfficiency of a computerRelative efficiency = Relative efficiency = ──────────────────────────────────
Eff. of reference computerEff. of reference computer
Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1313
SPEC2000 Relative Energy EfficiencySPEC2000 Relative Energy Efficiency
0
1
2
3
4
5
6
SP
EC
INT
20
00
SP
EC
FP
20
00
SP
EC
INT
20
00
SP
EC
FP
20
00
SP
EC
INT
20
00
SP
EC
FP
20
00
Pentium [email protected]/0.6GHz Energy-efficient procesor
Pentium [email protected] (Reference)
Pentium [email protected]
Always max. clock
Laptop adaptive clk.
Min. power min. clock
Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1414
Voltage ScalingVoltage Scaling
Dynamic: Reduce voltage and frequency during Dynamic: Reduce voltage and frequency during idle or low activity periods.idle or low activity periods.
Static: Static: Clustered voltage scalingClustered voltage scaling LogicLogic on non-critical paths given lower voltage.on non-critical paths given lower voltage. 47% power reduction with 10% area increase 47% power reduction with 10% area increase
reported.reported. M. Igarashi et al., “Clustered Voltage Scaling M. Igarashi et al., “Clustered Voltage Scaling
Techniques for Low-Power Design,” Techniques for Low-Power Design,” Proc. IEEE Proc. IEEE Symp. Low Power DesignSymp. Low Power Design, 1997., 1997.
Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1515
Pipeline GatingPipeline Gating A pipeline processor uses speculative execution.A pipeline processor uses speculative execution.
Incorrect branch prediction results in pipeline stalls and Incorrect branch prediction results in pipeline stalls and wasted energy.wasted energy.
Idea: Stop fetching instructions if a branch hazard is Idea: Stop fetching instructions if a branch hazard is expected:expected:
If the count (M) of incorrect predictions exceeds a pre-If the count (M) of incorrect predictions exceeds a pre-specified number (N), then suspend fetching instruction for specified number (N), then suspend fetching instruction for some k cycles.some k cycles.
Ref.: S. Manne, A. Klauser and D. Grunwald, “Pipeline Ref.: S. Manne, A. Klauser and D. Grunwald, “Pipeline Gating: Speculation Control for Energy Reduction,” Gating: Speculation Control for Energy Reduction,” Proc. Proc. 2525thth Annual International Symp. Computer Architecture Annual International Symp. Computer Architecture, , June 1998.June 1998.
Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1616
Slack SchedulingSlack Scheduling Application: Superscalar, out-of-order execution:Application: Superscalar, out-of-order execution:
An instruction is executed as soon as data and resources it An instruction is executed as soon as data and resources it needs become available.needs become available.
A commit unit reorders the results.A commit unit reorders the results.
Delay the execution of instructions whose result is not Delay the execution of instructions whose result is not immediately needed.immediately needed.
Example of RISC instructions:Example of RISC instructions: addadd r0, r1, r2;r0, r1, r2; (A)(A) sub r3, r4, r5;sub r3, r4, r5; (B)(B) and r9, x1, r9;and r9, x1, r9; (C)(C) or r5, r9, r10;or r5, r9, r10; (D)(D) xor r2, r10, r11;xor r2, r10, r11; (E)(E)
J. Casmira and D. Grunwald,“Dynamic Instruction SchedulingSlack,” Proc. ACM Kool ChipsWorkshop, Dec. 2000.
Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1717
Slack Scheduling ExampleSlack Scheduling Example
Slack schedulingSlack scheduling
AABB CC
DD
EE
Standard schedulingStandard scheduling
AA BB CC
DD
EE
Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1818
Slack SchedulingSlack Scheduling
Slack bitLow-power
execution units
Re-order buffer
Sch
edul
ing
logi
c
Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1919
Clock DistributionClock Distribution
clock
Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2020
Clock PowerClock Power
Pclk = CLVDD2f + CLVDD
2f / λ + CLVDD2f / λ2 + . . .
stages – 1 1= CLVDD
2f Σ ─ n = 0 λn
where CL = total load capacitance
λ = constant fanout at each stage in distributionnetwork
Clock consumes about 40% of total processor power.
Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2121
Clock Network ExamplesClock Network ExamplesAlpha 21064Alpha 21064 Alpha 21164Alpha 21164 Alpha 21264Alpha 21264
TechnologyTechnology 0.750.75μμ CMOS CMOS 0.50.5μμ CMOS CMOS 0.350.35μμ CMOS CMOS
Frequency (MHz)Frequency (MHz) 200200 300300 600600
Total capacitanceTotal capacitance 12.5nF12.5nF
Clock loadClock load 3.25nF3.25nF 3.75nF3.75nF
Clock powerClock power 40%40% 40% (20W)40% (20W)
Max. clock skewMax. clock skew 200ps (<10%)200ps (<10%) 90ps90ps
D. W. Bailey and B. J. Benschneider, “Clocking Design and Analysis for a 600-MHz Alpha Microprocessor,” IEEE J. Solid-State Circuits, vol. 33, no. 11, pp. 1627-1633, Nov. 1998.
Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2222
Power Reduction ExamplePower Reduction Example
Alpha 21064: 200MHz @ 3.45V, power dissipation =Alpha 21064: 200MHz @ 3.45V, power dissipation = 26W 26W Reduce voltage to 1.5V, power (5.3x) =Reduce voltage to 1.5V, power (5.3x) = 4.9W 4.9W Eliminate FP, power (3x) =Eliminate FP, power (3x) = 1.6W 1.6W Scale 0.75→0.35Scale 0.75→0.35μμ, power (2x) =, power (2x) = 0.8W 0.8W Reduce clock load, power (1.3x) =Reduce clock load, power (1.3x) = 0.6W 0.6W Reduce frequency 200→160MHz, power (1.25x) =Reduce frequency 200→160MHz, power (1.25x) = 0.5W 0.5W J. Montanaro et al., “A 160-MHz, 32-b, 0.5-W CMOS RISC J. Montanaro et al., “A 160-MHz, 32-b, 0.5-W CMOS RISC
Microprocessor,” Microprocessor,” IEEE J. Solid-State CircuitsIEEE J. Solid-State Circuits, vol. 31, no. 11, pp. , vol. 31, no. 11, pp. 1703-1714, Nov. 1996.1703-1714, Nov. 1996.
Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2323
Parallel ArchitectureParallel Architecture
Processor
f
Processor
f/2
Processor
f/2
f
Input Output
Input
Output
Capacitance = CVoltage = VFrequency = fPower = CV2f
Capacitance = 2.2CVoltage = 0.6VFrequency = 0.5fPower = 0.396CV2f
Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2424
Pipeline ArchitecturePipeline Architecture
Processor
f
Input Output
Re
gis
ter
½Proc.
f
Input Output
Re
gis
ter
½Proc.
Re
gis
ter
Capacitance = CVoltage = VFrequency = fPower = CV2f
Capacitance = 1.2CVoltage = 0.6VFrequency = fPower = 0.432CV2f
Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2525
Approximate TrendApproximate Trend n-parallel proc.n-parallel proc. n-stage pipeline proc.n-stage pipeline proc.
CapacitanceCapacitance nCnC CC
VoltageVoltage V/nV/n V/nV/n
FrequencyFrequency f/nf/n ff
PowerPower CVCV22f/nf/n22 CVCV22f/nf/n22
Chip areaChip area n timesn times 10-20% increase10-20% increase
G. K. Yeap, Practical Low Power Digital VLSI Design, Boston: KluwerAcademic Publishers, 1998.
Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2626
For More on MicroprocessorsFor More on Microprocessors
T. D. Burd and R. W. Brodersen, Energy T. D. Burd and R. W. Brodersen, Energy Efficient Microprocessor Design, Springer, 2002.Efficient Microprocessor Design, Springer, 2002.
R. Graybill and R. Melhem, R. Graybill and R. Melhem, Power Aware Power Aware ComputingComputing, New York: Plenum Publishers, , New York: Plenum Publishers, 2002.2002.