EE241 - Spring 2013bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s13/...12 23 Inverter RingOsc...

14
1 EE241 - Spring 2013 Advanced Digital Integrated Circuits Lecture 18: Dynamic Voltage Scaling 2 Announcements Homework 3 due today Quiz #3 today

Transcript of EE241 - Spring 2013bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s13/...12 23 Inverter RingOsc...

Page 1: EE241 - Spring 2013bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s13/...12 23 Inverter RingOsc RegFile SRAM 1.0 0.5 0 V T2V 3VT 4VT Normalized max. f CLK VDD Delay tracks within +/-

1

EE241 - Spring 2013Advanced Digital Integrated Circuits

Lecture 18: Dynamic Voltage Scaling

2

Announcements

Homework 3 due today

Quiz #3 today

Page 2: EE241 - Spring 2013bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s13/...12 23 Inverter RingOsc RegFile SRAM 1.0 0.5 0 V T2V 3VT 4VT Normalized max. f CLK VDD Delay tracks within +/-

2

3

Reading

Burd et al, A dynamic voltage scaled microprocessor system, IEEE Journal of Solid-State Circuits, vol. 35, no.11, pp. 1571-1580, Nov. 2000.

4

Outline

Last lecturePower-performance tradeoffs at circuit level

Tradeoffs through supply voltage

This lectureMultiple supplies

Dynamic voltage scaling

Page 3: EE241 - Spring 2013bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s13/...12 23 Inverter RingOsc RegFile SRAM 1.0 0.5 0 V T2V 3VT 4VT Normalized max. f CLK VDD Delay tracks within +/-

3

5. Low Power DesignF. Multiple Supplies

6

Power /Energy Optimization Space

Constant Throughput/Latency Variable Throughput/Latency

Energy Design Time Sleep Mode Run Time

Active

Logic design

Scaled VDD

Trans. sizing

Multi-VDD

Clock gatingDFS, DVS

Leakage

Stack effects

Trans sizing

Scaling VDD

+ Multi-VTh

Sleep T’s

Multi-VDD Variable VTh

+ Input control

+ Variable VTh

Page 4: EE241 - Spring 2013bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s13/...12 23 Inverter RingOsc RegFile SRAM 1.0 0.5 0 V T2V 3VT 4VT Normalized max. f CLK VDD Delay tracks within +/-

4

7

Reducing Active Power

Downsizing, lowering the supply on the critical path will lower the operating frequencyDownsize (lowering supply) non-critical paths

Narrows down the path delay distributionIncreases impact of variations

8

Multiple Supply Voltages

Block-level supply assignmentHigher throughput/lower latency functions are implemented in higher VDD

Slower functions are implemented with lower VDD

Often called “Voltage islands”Separate supply grids, level conversion performed at block boundaries

Multiple supplies inside a blockNon-critical paths moved to lower supply voltageLevel conversion within the blockPhysical design challenging

Page 5: EE241 - Spring 2013bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s13/...12 23 Inverter RingOsc RegFile SRAM 1.0 0.5 0 V T2V 3VT 4VT Normalized max. f CLK VDD Delay tracks within +/-

5

9

Leakage Issue

Driving from VDDL to VDDH Level converter

10

Multiple Supplies in a Block

Lower VDD portion is shaded

CVS StructureConventional Design

Critical Path

Level-Shifting F/F

Critical Path

FF

FF

FF

FF

FF

FF

FF

FF

FF

FF

FF

FF

FF

FF

FF

FF

FF

FF

FF

FF

FF

M.Takahashi, ISSCC’98. “Clustered voltage scaling”

Page 6: EE241 - Spring 2013bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s13/...12 23 Inverter RingOsc RegFile SRAM 1.0 0.5 0 V T2V 3VT 4VT Normalized max. f CLK VDD Delay tracks within +/-

6

11

Multiple Supplies in a Block

CVS Layout:

Usami’98

12

Level-Converting Flip-Flop

CK

CK

CLK CK

CK

D

VL

CK

Q

CK

VH

M1 M2

CK

Page 7: EE241 - Spring 2013bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s13/...12 23 Inverter RingOsc RegFile SRAM 1.0 0.5 0 V T2V 3VT 4VT Normalized max. f CLK VDD Delay tracks within +/-

7

13V1 = 1.5V, VTH = 0.3V, p(t):lambda

Po

wer

Red

uct

ion

Rat

io

0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3

1.4

V1 (V)

V2

(V

)

+

V2 (V)V

3(V

)

From Two to Three VDD’s

From Kuroda

14

1.0

0.5

Su

pp

ly V

olt

ag

e R

ati

o

1.0

0.4

0.5 1.0 1.5V1 (V)

Po

we

r D

iss

ipa

tio

n R

ati

o

V2/V1

P2/P1

{ V1, V2 }

V2/V1

V3/V1

{ V1, V2, V3 }

0.5 1.0 1.5V1 (V)

P3/P1

V2/V1

V3/V1

V4/V1

0.5 1.0 1.5V1 (V)

P4/P1

{ V1, V2, V3, V4 }

The more VDD’s, the less power, but the effect will be saturated. Power reduction effect will be decreased as VDD’s are scaled. Optimum V2/V1 is around 0.7.

Optimum Numbers of Supplies

Hamada, CICC’01

Page 8: EE241 - Spring 2013bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s13/...12 23 Inverter RingOsc RegFile SRAM 1.0 0.5 0 V T2V 3VT 4VT Normalized max. f CLK VDD Delay tracks within +/-

8

5. Low Power DesignG. Dynamic Voltage Scaling

16

Power /Energy Optimization Space

Constant Throughput/Latency Variable Throughput/Latency

Energy Design Time Sleep Mode Run Time

Active

Logic design

Scaled VDD

Trans. sizing

Multi-VDD

Clock gatingDFS, DVS

Leakage

Stack effects

Trans sizing

Scaling VDD

+ Multi-VTh

Sleep T’s

Multi-VDD Variable VTh

+ Input control

+ Variable VTh

Page 9: EE241 - Spring 2013bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s13/...12 23 Inverter RingOsc RegFile SRAM 1.0 0.5 0 V T2V 3VT 4VT Normalized max. f CLK VDD Delay tracks within +/-

9

17

Adaptive Supply Voltages

18

Processors for Portable Devices

1000

100

10

1

Perf

orm

ance

(MIP

S)

Processor Energy (Watt*sec)1 100.1

• Eliminate performance energy trade-off

PDAs

Pocket-PCs

NotebookComputers

DynamicVoltageScaling

BurdISSCC’00

Page 10: EE241 - Spring 2013bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s13/...12 23 Inverter RingOsc RegFile SRAM 1.0 0.5 0 V T2V 3VT 4VT Normalized max. f CLK VDD Delay tracks within +/-

10

19

Typical MPEG IDCT Histogram

20

timeSystem Idle

DesiredThroughput

Maximum Processor Speed

Background andhigh-latency processes

Compute-intensive andlow-latency processes

• Maximize Peak Throughput• Minimize Average Energy/operation

System Optimizations:BurdISSCC’00

Processor Usage Model

Page 11: EE241 - Spring 2013bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s13/...12 23 Inverter RingOsc RegFile SRAM 1.0 0.5 0 V T2V 3VT 4VT Normalized max. f CLK VDD Delay tracks within +/-

11

21

Compute ASAP:D

eliv

ered

Thr

ough

put

Clock Frequency Reduction:

Excessthroughput

Always high throughput

Energy/operation remains unchanged…while throughput scaled down with fCLK

fCLK

Reduced

time

time

Common Design Approaches (Fixed VDD)

22

0

0.5

1

0 0.5 1

Ener

gy/o

pera

tion

Throughput ( fCLK)

Constant supply voltage

Reduce VDD, slowcircuits down.

~10x EnergyReduction

3.3V

1.1V

BurdISSCC’00

Scale VDD with Clock Frequency

Page 12: EE241 - Spring 2013bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s13/...12 23 Inverter RingOsc RegFile SRAM 1.0 0.5 0 V T2V 3VT 4VT Normalized max. f CLK VDD Delay tracks within +/-

12

23

Inverter

RingOsc

RegFile

SRAM

1.0

0.5

0VT 2VT 3VT 4VT

Nor

mal

ized

max

. fC

LK

VDD

Delay tracks within +/- 10%BurdISSCC’00

CMOS Circuits Track Over VDD

24

time

• Dynamically scale energy/operation with throughput.• Always minimize speed minimize average energy/operation.• Extend battery life up to 10x with the exact same hardware!

Vary fCLK,VDD

Del

iver

edTh

roug

hput

1 2 Dynamically adapt

BurdISSCC’00

Dynamic Voltage Scaling (DVS)

Page 13: EE241 - Spring 2013bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s13/...12 23 Inverter RingOsc RegFile SRAM 1.0 0.5 0 V T2V 3VT 4VT Normalized max. f CLK VDD Delay tracks within +/-

13

25

• DVS requires a voltage scheduler (VS).• VS predicts workload to estimate CPU cycles.• Applications supply completion deadlines.

0

20

40

60

80

0 0.2 0.4 0.6 1.0 1.2 1.40.8

Processor Speed (MPEG)

F DES

IRED

(M

Hz)

Time (sec)

Operating System Sets Processor Speed

DESIREDCPU cycles

Ftime

26

RST Counter

Latch

Digital Loop Filter

LCDD

VDD

PENAB

NENABFERR

FMEAS

f1MHz

0110

100 FDES

+Register

fCLK

Ring Oscillator Processor

IDD

• Feedback loop sets VDD so that FERR 0.• Ring oscillator delay-matched to CPU critical paths.• Custom loop implementation Can optimize CDD.

7

Buck converter

Set byO.S.

BurdISSCC’00

Converter Loop Sets VDD, fCLK

Page 14: EE241 - Spring 2013bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s13/...12 23 Inverter RingOsc RegFile SRAM 1.0 0.5 0 V T2V 3VT 4VT Normalized max. f CLK VDD Delay tracks within +/-

14

27

Next Lecture

Continue DVS