University of Michigan Electrical Engineering and Computer Science 1 Cost-Efficient Soft Error...

22
1 University of Michigan Electrical Engineering and Computer Science Cost-Efficient Soft Error Protection for Embedded Microprocessors Jason Blome 1 , Shuguang Feng 1 , Shantanu Gupta 1 , Scott Mahlke 1 , Daryl Bradley 2 University of Michigan 1 ARM, Ltd. 2
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    213
  • download

    0

Transcript of University of Michigan Electrical Engineering and Computer Science 1 Cost-Efficient Soft Error...

1 University of MichiganElectrical Engineering and Computer Science

Cost-Efficient Soft Error Protection for Embedded Microprocessors

Jason Blome1, Shuguang Feng1, Shantanu Gupta1, Scott Mahlke1, Daryl Bradley2

University of Michigan1

ARM, Ltd. 2

2 University of MichiganElectrical Engineering and Computer Science

The Soft Error Problem

transient fault soft error

0CLK

DQ1

3 University of MichiganElectrical Engineering and Computer Science

Fault Masking

• Logical: faulted value does not affect logical operation of the circuit

0

0

• Latching-Window: the fault pulse does not reach a state element within the latching window

• Electrical: the fault pulse is electrically attenuated by subsequent gates in the circuit

• Architectural/Software: incorrect state is written before it is read

CLK

tsetup thold

mov r5, 8

mov r2, 4------

…d

eco

der

Register File

012345

add r6, r2, r5

mov r5, 8

mov r2, 4

98

4add r6, r2, r5

4 University of MichiganElectrical Engineering and Computer Science

Soft Error Rate Trends

Shivakumar 2002

Soft Error Rate Contributions

Mitra 2005

49%

11%

40%

StaticCombinationalLogicUnprotectedSRAMs

SequentialElements

Increasing contribution of faults in combinational logic to the overall soft error rate

5 University of MichiganElectrical Engineering and Computer Science

Outline

• Soft error analysis setup• Summary of fault analysis results• Fault tolerance techniques

► Register value cache► Strategic deployment of fault detectors

• Conclusion

6 University of MichiganElectrical Engineering and Computer Science

Fault Analysis Frameworktestbench

referencedesign

testdesign

report generationreport generation

benchmarkbenchmark

fault injection/error analysis framework

error checkingand logging

fault injectionscheduler

RegisterBank

RegisterBank

Data InterfaceData Interface

InstructionAddress

Logic

InstructionAddress

Logic

DataAddress

Logic

DataAddress

Logic

MultiplyMultiply ALU

ShiftShift

Instruction DecodeInstruction Decode

ARM926EJ-S

Instruction FetchInstruction Fetch

Datacache

Datacache

MMUMMU

Instructioncache

Instructioncache

MMUMMU

Bus Interface

Write Buffer/Bus Interface

MuxArray

MuxArray

7 University of MichiganElectrical Engineering and Computer Science

Observed Error Rates

Error Site Error Rate

Microarchitectural State 94%

Architectural State 7%

Error Site Error Rate

Microarchitectural State 16%

Architectural State 4%

Faults Occurring in Registers

Faults Occurring in Combinational Logic

At the software interface, error rates within 3%

94%

16%

7%

4%

8 University of MichiganElectrical Engineering and Computer Science

Impact of Fault Injection

05

101520253035404550

0 5 10 15 20Cycle

Nu

mb

er

of

Err

ors

Comb. Logic:Microarchitectural StateErrors

Comb. Logic: ArchitecturalState Errors

Seq. State:Microarchitectural StateErrors

Seq. State: ArchitecturalState Errors

9 University of MichiganElectrical Engineering and Computer Science

Targeting the Faults that Count

• ARM926EJ-S register file consumes 8.7% of total core area

► Responsible for 57.4% of architectural errors

• Register file area dominated by combinational logic

► ECC cost, efficacy?

10 University of MichiganElectrical Engineering and Computer Science

The Register Value Cache

Register Value Cache

Register File

CMP

CMP

CMP

Stall/Check CRC

dec

ode

r

012345

x

x…

10

32

54

Read/WriteAddr/Data Read Result

11 University of MichiganElectrical Engineering and Computer Science

The Register Value CacheValid

Read/WriteAddr

ReadData

Index Array

Value Array

Previous Read Values

CRC

CRC

WriteData

WriteData

Error

CMP Error

Read OperationWrite OperationCheck Operation

12 University of MichiganElectrical Engineering and Computer Science

Example

------

dec

ode

r

Register File

Register Cache

x

x…

----

4

8

40

48

mov r5, 8

mov r2, 4

add r3, r1, r4

mov r5, 8

mov r2, 4

add r3, r2, r5

CheckCRC

012345

10

32

54

---

-8 crc4 crc

13 University of MichiganElectrical Engineering and Computer Science

RVC Fault Coverage

57.4%

14 University of MichiganElectrical Engineering and Computer Science

RVC Overhead

15 University of MichiganElectrical Engineering and Computer Science

What About the Rest?• Leverage fault fanout to place detectors at

likely targets

16 University of MichiganElectrical Engineering and Computer Science

Fault Fanout

17 University of MichiganElectrical Engineering and Computer Science

Transient Fault Detector

Main Flip-Flop

ShadowLatchDelay

D

CLK

Error

Q

ShadowLatch

A Self-Tuning DVS Processor Using Delay-Error Detection and Correction: S. Das 2006

Main Flip-Flop

18 University of MichiganElectrical Engineering and Computer Science

Glitch Detector CoveragePower Area

Percent Overhead Percent Overhead

Co

ve

rag

e

Co

ve

rag

e

19 University of MichiganElectrical Engineering and Computer Science

Combined Technique CoveragePower Area

Percent Overhead Percent Overhead

Co

ve

rag

e

Co

ve

rag

e

20 University of MichiganElectrical Engineering and Computer Science

Conclusion

• Circuit level soft error analysis offers significant insight

• Faults in combinational logic do not require structural duplication

► Coverage versus cost tradeoffs available► Significant benefits in compromise

• 85% fault coverage for only 5.5% area► 2-3x increase in MTTF

21 University of MichiganElectrical Engineering and Computer Science

Questions?

22 University of MichiganElectrical Engineering and Computer Science

RVC Hit Rates

0.7

0.75

0.8

0.85

0.9

0.95

1

6 8 10 12 14 16

Cache Size

Hit

Rat

e

cjpeg

djpeg

epic

unepic

g721decode

g721encode

pegwitdecode

pegwitencode

rawcaudio

rawdaudio

average