ELEC 7770 Advanced VLSI Design Spring 2007 Power Aware Microprocessors

26
Spring 07, Feb 22 Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Ag ELEC 7770: Advanced VLSI Design (Ag rawal) rawal) 1 ELEC 7770 ELEC 7770 Advanced VLSI Design Advanced VLSI Design Spring 2007 Spring 2007 Power Aware Microprocessors Power Aware Microprocessors Vishwani D. Agrawal Vishwani D. Agrawal James J. Danaher Professor James J. Danaher Professor ECE Department, Auburn University ECE Department, Auburn University Auburn, AL 36849 Auburn, AL 36849 [email protected] [email protected] http://www.eng.auburn.edu/~vagrawal/COURSE/E77 http://www.eng.auburn.edu/~vagrawal/COURSE/E77 70_Spr07 70_Spr07

description

ELEC 7770 Advanced VLSI Design Spring 2007 Power Aware Microprocessors. Vishwani D. Agrawal James J. Danaher Professor ECE Department, Auburn University Auburn, AL 36849 [email protected] http://www.eng.auburn.edu/~vagrawal/COURSE/E7770_Spr07. SIA Roadmap for Processors (1999). - PowerPoint PPT Presentation

Transcript of ELEC 7770 Advanced VLSI Design Spring 2007 Power Aware Microprocessors

Page 1: ELEC 7770 Advanced VLSI Design Spring 2007 Power Aware Microprocessors

Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 11

ELEC 7770ELEC 7770Advanced VLSI DesignAdvanced VLSI Design

Spring 2007Spring 2007Power Aware MicroprocessorsPower Aware Microprocessors

Vishwani D. AgrawalVishwani D. AgrawalJames J. Danaher ProfessorJames J. Danaher Professor

ECE Department, Auburn UniversityECE Department, Auburn University

Auburn, AL 36849Auburn, AL 36849

[email protected]@eng.auburn.edu

http://www.eng.auburn.edu/~vagrawal/COURSE/E7770_Spr07http://www.eng.auburn.edu/~vagrawal/COURSE/E7770_Spr07

Page 2: ELEC 7770 Advanced VLSI Design Spring 2007 Power Aware Microprocessors

Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 22

SIA Roadmap for Processors (1999)SIA Roadmap for Processors (1999)YearYear 19991999 20022002 20052005 20082008 20112011 20142014

Feature size (nm)Feature size (nm) 180180 130130 100100 7070 5050 3535

Logic transistors/cmLogic transistors/cm22 6.2M6.2M 18M18M 39M39M 84M84M 180M180M 390M390M

Clock (GHz)Clock (GHz) 1.251.25 2.12.1 3.53.5 6.06.0 10.010.0 16.916.9

Chip size (mmChip size (mm22)) 340340 430430 520520 620620 750750 900900

Power supply (V)Power supply (V) 1.81.8 1.51.5 1.21.2 0.90.9 0.60.6 0.50.5

High-perf. Power (W)High-perf. Power (W) 9090 130130 160160 170170 175175 183183

Source: http://www.semichips.org

Page 3: ELEC 7770 Advanced VLSI Design Spring 2007 Power Aware Microprocessors

Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 33

Power Reduction in ProcessorsPower Reduction in Processors

Just about everything is used.Just about everything is used. Hardware methods:Hardware methods:

Voltage reduction for dynamic powerVoltage reduction for dynamic power Dual-threshold devices for leakage reductionDual-threshold devices for leakage reduction Clock gating, frequency reductionClock gating, frequency reduction Sleep modeSleep mode

Architecture:Architecture: Instruction setInstruction set hardware organizationhardware organization

Software methodsSoftware methods

Page 4: ELEC 7770 Advanced VLSI Design Spring 2007 Power Aware Microprocessors

Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 44

SPEC CPU2000 BenchmarksSPEC CPU2000 Benchmarks Twelve integer and 14 floating point programs, Twelve integer and 14 floating point programs,

CINT2000CINT2000 and and CFP2000CFP2000.. Each program run time is normalized to obtain a Each program run time is normalized to obtain a

SPEC ratioSPEC ratio with respect to the run time of with respect to the run time of Sun Sun Ultra 5_10 with a 300MHz processorUltra 5_10 with a 300MHz processor..

CINT2000CINT2000 and and CFP2000CFP2000 summary summary measurements are the geometric means of measurements are the geometric means of SPEC ratios.SPEC ratios.

Page 5: ELEC 7770 Advanced VLSI Design Spring 2007 Power Aware Microprocessors

Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 55

Reference CPU s: Sun Ultra 5_10 Reference CPU s: Sun Ultra 5_10 300MHz Processor300MHz Processor

0

500

1000

1500

2000

2500

3000

3500g

zip

vp

rg

cc

mc

fc

raft

yp

ars

er

eo

np

erl

bm

kg

ap

vo

rte

xb

zip

2tw

olf

wu

pw

ise

sw

imm

gri

da

pp

lum

es

ag

alg

el

art

eq

ua

ke

fac

ere

ca

mm

plu

ca

sfm

a3

ds

ixtr

ac

ka

ps

i

CINT2000

CFP2000

Page 6: ELEC 7770 Advanced VLSI Design Spring 2007 Power Aware Microprocessors

Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 66

CINT2000: 3.4GHz Pentium 4, HT CINT2000: 3.4GHz Pentium 4, HT Technology (D850MD Motherboard)Technology (D850MD Motherboard)

0

500

1000

1500

2000

2500g

zip

vpr

gcc

mcf

craf

ty

par

ser

eon

per

lbm

k

gap

vort

ex

bzi

p2

two

lf

Base ratio

Opt. ratio

SPECint2000_base = 1341SPECint2000 = 1389

Source: www.spec.org

Page 7: ELEC 7770 Advanced VLSI Design Spring 2007 Power Aware Microprocessors

Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 77

Two Benchmark ResultsTwo Benchmark Results

Baseline: A uniform configuration not optimized Baseline: A uniform configuration not optimized for specific program:for specific program:

Same compiler with same settings and flags used Same compiler with same settings and flags used for all benchmarksfor all benchmarks

Other restrictionsOther restrictions

Peak: Run is optimized for obtaining the peak Peak: Run is optimized for obtaining the peak performance for each benchmark program.performance for each benchmark program.

Page 8: ELEC 7770 Advanced VLSI Design Spring 2007 Power Aware Microprocessors

Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 88

CFP2000: 3.6GHz Pentium 4, HT Technology CFP2000: 3.6GHz Pentium 4, HT Technology (D925XCV/AA-400 Motherboard)(D925XCV/AA-400 Motherboard)

0

500

1000

1500

2000

2500

3000w

up

wis

esw

im

mg

rid

app

lum

esa

gal

gel art

equ

ake

face

rec

amm

plu

cas

fma3

dsi

xtra

ck

apsi

Base ratio

Opt. ratio

SPECfp2000_base = 1627SPECfp2000 = 1630

Source: www.spec.org

Page 9: ELEC 7770 Advanced VLSI Design Spring 2007 Power Aware Microprocessors

Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 99

CINT2000: 1.7GHz Pentium 4CINT2000: 1.7GHz Pentium 4(D850MD Motherboard)(D850MD Motherboard)

0100200300400500600700800900

1000g

zip

vpr

gcc

mcf

craf

ty

par

ser

eon

per

lbm

k

gap

vort

ex

bzi

p2

two

lf

Base ratio

Opt. ratio

SPECint2000_base = 579SPECint2000 = 588

Source: www.spec.org

Page 10: ELEC 7770 Advanced VLSI Design Spring 2007 Power Aware Microprocessors

Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1010

CFP2000: 1.7GHz Pentium 4 (D850MD CFP2000: 1.7GHz Pentium 4 (D850MD Motherboard)Motherboard)

0

200

400

600

800

1000

1200

1400w

up

wis

esw

im

mg

rid

app

lum

esa

gal

gel art

equ

ake

face

rec

amm

plu

cas

fma3

dsi

xtra

ck

apsi

Base ratio

Opt. ratio

SPECfp2000_base = 648SPECfp2000 = 659

Source: www.spec.org

Page 11: ELEC 7770 Advanced VLSI Design Spring 2007 Power Aware Microprocessors

Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1111

Energy SPEC BenchmarksEnergy SPEC Benchmarks

Energy efficiency mode: Besides the execution Energy efficiency mode: Besides the execution time, energy efficiency of SPEC benchmark time, energy efficiency of SPEC benchmark programs is also measured. Energy efficiency of programs is also measured. Energy efficiency of a benchmark program is given by:a benchmark program is given by:

1/(Execution time)1/(Execution time)Energy efficiency Energy efficiency == ────────────────────────

joules consumedjoules consumed

Page 12: ELEC 7770 Advanced VLSI Design Spring 2007 Power Aware Microprocessors

Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1212

Energy EfficiencyEnergy Efficiency

Efficiency averaged on Efficiency averaged on nn benchmark programs: benchmark programs:

nn

EfficiencyEfficiency == (( ΠΠ Efficiency Efficiencyii ))1/1/nn

ii=1=1

where Efficiencywhere Efficiencyii is the efficiency for program is the efficiency for program ii..

Relative efficiency:Relative efficiency:

Efficiency of a computerEfficiency of a computerRelative efficiency = Relative efficiency = ──────────────────────────────────

Eff. of reference computerEff. of reference computer

Page 13: ELEC 7770 Advanced VLSI Design Spring 2007 Power Aware Microprocessors

Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1313

SPEC2000 Relative Energy EfficiencySPEC2000 Relative Energy Efficiency

0

1

2

3

4

5

6

SP

EC

INT

20

00

SP

EC

FP

20

00

SP

EC

INT

20

00

SP

EC

FP

20

00

SP

EC

INT

20

00

SP

EC

FP

20

00

Pentium [email protected]/0.6GHz Energy-efficient procesor

Pentium [email protected] (Reference)

Pentium [email protected]

Always max. clock

Laptop adaptive clk.

Min. power min. clock

Page 14: ELEC 7770 Advanced VLSI Design Spring 2007 Power Aware Microprocessors

Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1414

Voltage ScalingVoltage Scaling

Dynamic: Reduce voltage and frequency during Dynamic: Reduce voltage and frequency during idle or low activity periods.idle or low activity periods.

Static: Static: Clustered voltage scalingClustered voltage scaling LogicLogic on non-critical paths given lower voltage.on non-critical paths given lower voltage. 47% power reduction with 10% area increase 47% power reduction with 10% area increase

reported.reported. M. Igarashi et al., “Clustered Voltage Scaling M. Igarashi et al., “Clustered Voltage Scaling

Techniques for Low-Power Design,” Techniques for Low-Power Design,” Proc. IEEE Proc. IEEE Symp. Low Power DesignSymp. Low Power Design, 1997., 1997.

Page 15: ELEC 7770 Advanced VLSI Design Spring 2007 Power Aware Microprocessors

Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1515

Pipeline GatingPipeline Gating A pipeline processor uses speculative execution.A pipeline processor uses speculative execution.

Incorrect branch prediction results in pipeline stalls and Incorrect branch prediction results in pipeline stalls and wasted energy.wasted energy.

Idea: Stop fetching instructions if a branch hazard is Idea: Stop fetching instructions if a branch hazard is expected:expected:

If the count (M) of incorrect predictions exceeds a pre-If the count (M) of incorrect predictions exceeds a pre-specified number (N), then suspend fetching instruction for specified number (N), then suspend fetching instruction for some k cycles.some k cycles.

Ref.: S. Manne, A. Klauser and D. Grunwald, “Pipeline Ref.: S. Manne, A. Klauser and D. Grunwald, “Pipeline Gating: Speculation Control for Energy Reduction,” Gating: Speculation Control for Energy Reduction,” Proc. Proc. 2525thth Annual International Symp. Computer Architecture Annual International Symp. Computer Architecture, , June 1998.June 1998.

Page 16: ELEC 7770 Advanced VLSI Design Spring 2007 Power Aware Microprocessors

Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1616

Slack SchedulingSlack Scheduling Application: Superscalar, out-of-order execution:Application: Superscalar, out-of-order execution:

An instruction is executed as soon as data and resources it An instruction is executed as soon as data and resources it needs become available.needs become available.

A commit unit reorders the results.A commit unit reorders the results.

Delay the execution of instructions whose result is not Delay the execution of instructions whose result is not immediately needed.immediately needed.

Example of RISC instructions:Example of RISC instructions: addadd r0, r1, r2;r0, r1, r2; (A)(A) sub r3, r4, r5;sub r3, r4, r5; (B)(B) and r9, x1, r9;and r9, x1, r9; (C)(C) or r5, r9, r10;or r5, r9, r10; (D)(D) xor r2, r10, r11;xor r2, r10, r11; (E)(E)

J. Casmira and D. Grunwald,“Dynamic Instruction SchedulingSlack,” Proc. ACM Kool ChipsWorkshop, Dec. 2000.

Page 17: ELEC 7770 Advanced VLSI Design Spring 2007 Power Aware Microprocessors

Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1717

Slack Scheduling ExampleSlack Scheduling Example

Slack schedulingSlack scheduling

AABB CC

DD

EE

Standard schedulingStandard scheduling

AA BB CC

DD

EE

Page 18: ELEC 7770 Advanced VLSI Design Spring 2007 Power Aware Microprocessors

Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1818

Slack SchedulingSlack Scheduling

Slack bitLow-power

execution units

Re-order buffer

Sch

edul

ing

logi

c

Page 19: ELEC 7770 Advanced VLSI Design Spring 2007 Power Aware Microprocessors

Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1919

Clock DistributionClock Distribution

clock

Page 20: ELEC 7770 Advanced VLSI Design Spring 2007 Power Aware Microprocessors

Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2020

Clock PowerClock Power

Pclk = CLVDD2f + CLVDD

2f / λ + CLVDD2f / λ2 + . . .

stages – 1 1= CLVDD

2f Σ ─ n = 0 λn

where CL = total load capacitance

λ = constant fanout at each stage in distributionnetwork

Clock consumes about 40% of total processor power.

Page 21: ELEC 7770 Advanced VLSI Design Spring 2007 Power Aware Microprocessors

Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2121

Clock Network ExamplesClock Network ExamplesAlpha 21064Alpha 21064 Alpha 21164Alpha 21164 Alpha 21264Alpha 21264

TechnologyTechnology 0.750.75μμ CMOS CMOS 0.50.5μμ CMOS CMOS 0.350.35μμ CMOS CMOS

Frequency (MHz)Frequency (MHz) 200200 300300 600600

Total capacitanceTotal capacitance 12.5nF12.5nF

Clock loadClock load 3.25nF3.25nF 3.75nF3.75nF

Clock powerClock power 40%40% 40% (20W)40% (20W)

Max. clock skewMax. clock skew 200ps (<10%)200ps (<10%) 90ps90ps

D. W. Bailey and B. J. Benschneider, “Clocking Design and Analysis for a 600-MHz Alpha Microprocessor,” IEEE J. Solid-State Circuits, vol. 33, no. 11, pp. 1627-1633, Nov. 1998.

Page 22: ELEC 7770 Advanced VLSI Design Spring 2007 Power Aware Microprocessors

Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2222

Power Reduction ExamplePower Reduction Example

Alpha 21064: 200MHz @ 3.45V, power dissipation =Alpha 21064: 200MHz @ 3.45V, power dissipation = 26W 26W Reduce voltage to 1.5V, power (5.3x) =Reduce voltage to 1.5V, power (5.3x) = 4.9W 4.9W Eliminate FP, power (3x) =Eliminate FP, power (3x) = 1.6W 1.6W Scale 0.75→0.35Scale 0.75→0.35μμ, power (2x) =, power (2x) = 0.8W 0.8W Reduce clock load, power (1.3x) =Reduce clock load, power (1.3x) = 0.6W 0.6W Reduce frequency 200→160MHz, power (1.25x) =Reduce frequency 200→160MHz, power (1.25x) = 0.5W 0.5W J. Montanaro et al., “A 160-MHz, 32-b, 0.5-W CMOS RISC J. Montanaro et al., “A 160-MHz, 32-b, 0.5-W CMOS RISC

Microprocessor,” Microprocessor,” IEEE J. Solid-State CircuitsIEEE J. Solid-State Circuits, vol. 31, no. 11, pp. , vol. 31, no. 11, pp. 1703-1714, Nov. 1996.1703-1714, Nov. 1996.

Page 23: ELEC 7770 Advanced VLSI Design Spring 2007 Power Aware Microprocessors

Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2323

Parallel ArchitectureParallel Architecture

Processor

f

Processor

f/2

Processor

f/2

f

Input Output

Input

Output

Capacitance = CVoltage = VFrequency = fPower = CV2f

Capacitance = 2.2CVoltage = 0.6VFrequency = 0.5fPower = 0.396CV2f

Page 24: ELEC 7770 Advanced VLSI Design Spring 2007 Power Aware Microprocessors

Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2424

Pipeline ArchitecturePipeline Architecture

Processor

f

Input Output

Re

gis

ter

½Proc.

f

Input Output

Re

gis

ter

½Proc.

Re

gis

ter

Capacitance = CVoltage = VFrequency = fPower = CV2f

Capacitance = 1.2CVoltage = 0.6VFrequency = fPower = 0.432CV2f

Page 25: ELEC 7770 Advanced VLSI Design Spring 2007 Power Aware Microprocessors

Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2525

Approximate TrendApproximate Trend n-parallel proc.n-parallel proc. n-stage pipeline proc.n-stage pipeline proc.

CapacitanceCapacitance nCnC CC

VoltageVoltage V/nV/n V/nV/n

FrequencyFrequency f/nf/n ff

PowerPower CVCV22f/nf/n22 CVCV22f/nf/n22

Chip areaChip area n timesn times 10-20% increase10-20% increase

G. K. Yeap, Practical Low Power Digital VLSI Design, Boston: KluwerAcademic Publishers, 1998.

Page 26: ELEC 7770 Advanced VLSI Design Spring 2007 Power Aware Microprocessors

Spring 07, Feb 22Spring 07, Feb 22 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2626

For More on MicroprocessorsFor More on Microprocessors

T. D. Burd and R. W. Brodersen, Energy T. D. Burd and R. W. Brodersen, Energy Efficient Microprocessor Design, Springer, 2002.Efficient Microprocessor Design, Springer, 2002.

R. Graybill and R. Melhem, R. Graybill and R. Melhem, Power Aware Power Aware ComputingComputing, New York: Plenum Publishers, , New York: Plenum Publishers, 2002.2002.