Process Variation in Near-threshold Wide SIMD Architectures

24
1 1 1 Process Variation in Near- threshold Wide SIMD Architectures Sangwon Seo 1 , Ronald G. Dreslinski 1 , Mark Woh 1 , Yongjun Park 1 , Chaitali Chakrabarti 2 , Scott Mahlke 1 , David Blaauw 1 , Trevor Mudge 1 University of Michigan 1 , Arizona State University 2

description

Process Variation in Near-threshold Wide SIMD Architectures. Sangwon Seo 1 , Ronald G. Dreslinski 1 , Mark Woh 1 , Yongjun Park 1 , Chaitali Chakrabarti 2 , Scott Mahlke 1 , David Blaauw 1 , Trevor Mudge 1 University of Michigan 1 , Arizona State University 2. Near Threshold Computing. - PowerPoint PPT Presentation

Transcript of Process Variation in Near-threshold Wide SIMD Architectures

Page 1: Process Variation in Near-threshold Wide SIMD Architectures

11

1

Process Variation in Near-threshold Wide SIMD Architectures

Sangwon Seo1, Ronald G. Dreslinski1, Mark Woh1, Yongjun Park1,Chaitali Chakrabarti2, Scott Mahlke1, David Blaauw1, Trevor Mudge1

University of Michigan1, Arizona State University2

Page 2: Process Variation in Near-threshold Wide SIMD Architectures

22

2Near Threshold Computing

Super Threshold high performance

high energy consumption

Near Threshold 10x energy reduction

10x performance degradation

Sub Threshold exponentially decreasing

performance

increasing leakage becomes dominant

2

Page 3: Process Variation in Near-threshold Wide SIMD Architectures

33

3Near-threshold Computing

Advantage: High energy efficiency

Disadvantage Low performance throughput

Compensated with very wide SIMD architecture

Sensitive to variations in threshold voltage

More critical issues in wide SIMD architectures Increased probability of timing errors

Expensive error recovery mechanisms

3

Page 4: Process Variation in Near-threshold Wide SIMD Architectures

44

4Near-threshold Computing

Advantage: High energy efficiency

Disadvantage Low performance throughput

Compensated with very wide SIMD architecture

Sensitive to variations in threshold voltage

More critical issues in wide SIMD architectures Increased probability of timing errors

Expensive error recovery mechanisms

How bad is the delay variation in wide SIMD architectures running at near-threshold voltages?

How to mitigate the variation-induced timing errors?

4

Page 5: Process Variation in Near-threshold Wide SIMD Architectures

55

5Delay Variations in 90nm

5

~2.3x ~1.6x

Uncorrelated variations are averaged out over the chain.

Page 6: Process Variation in Near-threshold Wide SIMD Architectures

66

6Delay Variations – f(Vdd=0.55V, N)

6

A long chain helps, but the effect diminishes as N increases.

Variations are exacerbated with technology scaling.

Page 7: Process Variation in Near-threshold Wide SIMD Architectures

77

7Delay Variations – f(Vdd, N=50)

7

LER causes high variations in advanced technology nodes

Strict Design Rules

Metal-Gates w/ high-k material or SOI

Advanced lithography

Page 8: Process Variation in Near-threshold Wide SIMD Architectures

88

8Delay Distribution – 90nm GP

8

1 critical path delay = delay of a chain of 50 FO4 inverters.

1-wide system delay = max (delays of 100 critical paths )

128-wide system delay = max (delays of 128 1-wide system)

Performance Drop

Page 9: Process Variation in Near-threshold Wide SIMD Architectures

99

9Variation Effects on 128-wide SIMD Architecture

9

- Structural Duplication- Voltage margining- Frequency margining

Page 10: Process Variation in Near-threshold Wide SIMD Architectures

1010

10Near-threshold Wide SIMD Architecture: Diet SODA

10

[Seo et al. ISLPED 2010]

Page 11: Process Variation in Near-threshold Wide SIMD Architectures

1111

11Structural Duplication

11

SIMD Function Unit #7

SIMD Function Unit #6

SIMD Function Unit #5

SIMD Function Unit #4

SIMD Function Unit #3

SIMD Function Unit #2

SIMD Function Unit #1

SIMD Function Unit #0

SIMD Function Unit #9

SIMD Function Unit #8

Crossbar

Datapath#7

Datapath#6

Datapath#5

Datapath#4

Datapath#3

Datapath#2

Datapath#1

Datapath#0

8-wide+2-spare system

Increase number of processing resources

Page 12: Process Variation in Near-threshold Wide SIMD Architectures

1212

12Structural Duplication

12

SIMD Function Unit #7

SIMD Function Unit #6

SIMD Function Unit #5

SIMD Function Unit #4

SIMD Function Unit #3

SIMD Function Unit #2

SIMD Function Unit #1

SIMD Function Unit #0

SIMD Function Unit #9

SIMD Function Unit #8

Crossbar

Datapath#6

Datapath#6

Datapath#5

Datapath#4

Datapath#3

Datapath#2

Datapath#1

Datapath#0

8-wide+2-spare system

Use the spares if required.

Page 13: Process Variation in Near-threshold Wide SIMD Architectures

1313

13Structural Duplication – 90nm GP

13

6 spares are required to match the chip delay of baseline architecture.

Page 14: Process Variation in Near-threshold Wide SIMD Architectures

1414

14Voltage Margining

14

Delay distributions: 45nm PTM model is used

Increase supply voltage

Page 15: Process Variation in Near-threshold Wide SIMD Architectures

1515

15Frequency Margining

Increase clock period

Applicable for applications with relaxed time constraints

For advanced technology nodes, this is impractical

Caveat

Consider its impact on system

SIMD subsystem clock period (Tclk@NTV)

memory subsystem clock period (Tclk@FV)

15

Page 16: Process Variation in Near-threshold Wide SIMD Architectures

1616

16Structural Duplication vs. Voltage Margining

16

Page 17: Process Variation in Near-threshold Wide SIMD Architectures

1717

17Combination of two schemes – 45nm GP

17

128-wide system @ 0.6V

26 spares

17mV boost

5mV + 8 spares

10mV + 2 spares

Page 18: Process Variation in Near-threshold Wide SIMD Architectures

1818

18Variation-Aware Diet SODA

18

Page 19: Process Variation in Near-threshold Wide SIMD Architectures

1919

19Conclusions

Near-threshold operation of wide SIMD system can have timing problems due to process variations.

Variation effects on a 128-wide SIMD architecture are marginal for 90nm technology node, but could be non-negligible for current/future technology nodes.

A combination of structural duplication and voltage margining provides a minimal power overhead solution to mitigate variation-induced timing problems in wide SIMD architectures.

19

Page 20: Process Variation in Near-threshold Wide SIMD Architectures

2020

20Questions?

Thank you!

20

Page 21: Process Variation in Near-threshold Wide SIMD Architectures

2121

21Backup Slides

21

Page 22: Process Variation in Near-threshold Wide SIMD Architectures

2222

22Local Spares vs. Global Spares

22

Local Sparing 1 out of 4

(2 spares)

Global Sparing

(2 spares)

+ small overhead

- burst errors

+ burst errors

- Large overhead

Page 23: Process Variation in Near-threshold Wide SIMD Architectures

2323

23Local Spares vs. Global Spares

23

Global sparing is better than local sparing.

XRAM crossbar supports global sparing.

128 + 8 global spares

128 + 32 local spares(1 out of 4)

Page 24: Process Variation in Near-threshold Wide SIMD Architectures

2424

24Variation-Aware Diet SODA

24

With little area and power overhead, delay variations can be solved.