TMR Schemes Melanie Berg MEI Technologies/NASA GSFC [email protected].

29
TMR Schemes TMR Schemes Melanie Berg Melanie Berg MEI Technologies/NASA GSFC MEI Technologies/NASA GSFC [email protected] [email protected] Voting M atrix

Transcript of TMR Schemes Melanie Berg MEI Technologies/NASA GSFC [email protected].

Page 1: TMR Schemes Melanie Berg MEI Technologies/NASA GSFC Melanie.D.Berg@NASA.gov.

TMR SchemesTMR Schemes

Melanie BergMelanie Berg

MEI Technologies/NASA GSFCMEI Technologies/NASA GSFC

[email protected]@NASA.gov

Voting Matrix

Page 2: TMR Schemes Melanie Berg MEI Technologies/NASA GSFC Melanie.D.Berg@NASA.gov.

Page Page 22European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg

OverviewOverview

Premise: Why do various FPGAs require Premise: Why do various FPGAs require separate mitigation strategies?separate mitigation strategies?

Radiation Effects in FPGA devicesRadiation Effects in FPGA devices

Mitigation and Actel Anti-fuse DevicesMitigation and Actel Anti-fuse Devices

Mitigation and Xilinx Virtex DevicesMitigation and Xilinx Virtex Devices

ToolsTools

Page 3: TMR Schemes Melanie Berg MEI Technologies/NASA GSFC Melanie.D.Berg@NASA.gov.

Page Page 33European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg

Radiation Effects in FPGA devicesRadiation Effects in FPGA devicesSingle Event Transients (SETs)Single Event Transients (SETs)

Single Event Upsets (SEUs)Single Event Upsets (SEUs)

Single Event Functional Interrupts (SEFIs)Single Event Functional Interrupts (SEFIs)

Page 4: TMR Schemes Melanie Berg MEI Technologies/NASA GSFC Melanie.D.Berg@NASA.gov.

Page Page 44European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg

Single Event Effects (SEEs) and IC Single Event Effects (SEEs) and IC System ErrorSystem Error

SEUs or SETs can occur in:SEUs or SETs can occur in:Combinatorial LogicCombinatorial Logic

Sequential LogicSequential Logic

Configuration Memory CellsConfiguration Memory Cells

Depending on the Device and the design, Depending on the Device and the design, each fault type will:each fault type will:

Have a probability of occurrenceHave a probability of occurrence

Either have a significant or insignificant Either have a significant or insignificant contribution to system errorcontribution to system error

Every Device has different Error Responses – We must understand the differences and design

appropriately

Page 5: TMR Schemes Melanie Berg MEI Technologies/NASA GSFC Melanie.D.Berg@NASA.gov.

Page Page 55European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg

Combinatorial Logic Blocks and Potential Combinatorial Logic Blocks and Potential Upsets… SETs in Anti-fuse FPGAsUpsets… SETs in Anti-fuse FPGAs

Page 6: TMR Schemes Melanie Berg MEI Technologies/NASA GSFC Melanie.D.Berg@NASA.gov.

Page Page 66European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg

Basic Combinatorial Logic Blocks and Potential Basic Combinatorial Logic Blocks and Potential UpsetsUpsets

TRANSIENT

PSET

STUCK UNTIL OVERWRITTEN

Probability of Configuration Fault

PConfiguration

Page 7: TMR Schemes Melanie Berg MEI Technologies/NASA GSFC Melanie.D.Berg@NASA.gov.

Page Page 77European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg

DFF’s: SEUs and SEFIsDFF’s: SEUs and SEFIs

Strike Caught in Loop

D Q

reset

CLK

PDFFSEU

Probability of SEU

Probability of SEFIPSEFI

Page 8: TMR Schemes Melanie Berg MEI Technologies/NASA GSFC Melanie.D.Berg@NASA.gov.

Page Page 88European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg

Transient Capture on A DFF Data Input Pin Transient Capture on A DFF Data Input Pin (SET→SEU)(SET→SEU)

clockTpulse

tp = 1/fs

Q

QSET

CLR

D

P(fs)SET→SEU

fs

PfsPfsPfsTfsP DFFEnSETpropSETgenpulse

seuset 12

)()()(

fs : System Frequency

T(fs)pulse : SET Pulse Width

P(fs)SETgen: Probability SET generated with sufficient amplitude

P(fs)SETprop : Probability SET can propagate with sufficient amplitude

PDFFEn : Probability DFF is enabled (active)

P(fs)SET→SEU : Probability SET can be caught by clock edge

Page 9: TMR Schemes Melanie Berg MEI Technologies/NASA GSFC Melanie.D.Berg@NASA.gov.

Page Page 99European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg

Frequency Effects and Frequency Effects and Conventional DFF Upset TheoryConventional DFF Upset Theory

DF

Fer

ror

Frequency

DFFMBUSEUSETDFFSEUDFFerror PfsPPfsP )(

Composite Cross Section

~0

PDFFSEU & PDFFMBU

P(fs) SET→SEU

PDFF(fs)error

Page 10: TMR Schemes Melanie Berg MEI Technologies/NASA GSFC Melanie.D.Berg@NASA.gov.

Page Page 1010European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg

Summary: Most Significant Factors of Summary: Most Significant Factors of System Error Probability P(System Error Probability P(fsfs))errorerror

SEFISEUSETDFFSEUionConfiguraterror PfsPPPfsP )( SEFISEUSETDFFSEUionConfiguraterror PfsPPPfsP )(

Configuration DFFs SEFIs

SRAM Based FPGAs

STATIC

SEU

Dynamic

SET→SEU

Clocks & Resets

Inaccessible control circuitry

ionConfiguratPDFFSEUP

SEUSETfsP )( SEFIP

Page 11: TMR Schemes Melanie Berg MEI Technologies/NASA GSFC Melanie.D.Berg@NASA.gov.

Page Page 1111European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg

Reducing System Error: Common Mitigation Reducing System Error: Common Mitigation TechniquesTechniques

Mitigation can be:Mitigation can be:EmbeddedEmbedded: built into the device library cells: built into the device library cells

User does not verify the mitigation – manufacturer doesUser does not verify the mitigation – manufacturer does

User insertedUser inserted:: part of the actual design process part of the actual design processUser must verify mitigation… Complexity is a RISK!!!!!!!!User must verify mitigation… Complexity is a RISK!!!!!!!!

Common Mitigation Types:Common Mitigation Types:Local Triple Modular Redundancy (LTMR)Local Triple Modular Redundancy (LTMR)

Global Triple Modular Redundancy (GTMR)Global Triple Modular Redundancy (GTMR)

SEFISEUSETDFFSEUionConfiguraterror PfsPPPfsP )( SEFISEUSETDFFSEUionConfiguraterror PfsPPPfsP )(

Page 12: TMR Schemes Melanie Berg MEI Technologies/NASA GSFC Melanie.D.Berg@NASA.gov.

Page Page 1212European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg

Example Mitigation Schemes Example Mitigation Schemes will use Majority Votingwill use Majority Voting

I0I0 I1I1 I2I2 Majority VoterMajority Voter

00 00 00 00

00 00 11 00

00 11 00 00

00 11 11 11

11 00 00 00

11 00 11 11

11 11 00 11

11 11 11 11

102021 IIIIIIterMajorityVo

Page 13: TMR Schemes Melanie Berg MEI Technologies/NASA GSFC Melanie.D.Berg@NASA.gov.

Page Page 1313European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg

Mitigation and Actel Anti-Mitigation and Actel Anti-fuse Devicesfuse Devices

Page 14: TMR Schemes Melanie Berg MEI Technologies/NASA GSFC Melanie.D.Berg@NASA.gov.

Page Page 1414European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg

ACTEL RTAX-S Architecture BasicsACTEL RTAX-S Architecture Basics

Embedded RHBD:Embedded RHBD:Hardened Global Clocks and ResetsHardened Global Clocks and Resets

Antifuse Configuration is SEU immuneAntifuse Configuration is SEU immune

Embedded Localized TMR (LTMR) at each DFF (RCELL) Embedded Localized TMR (LTMR) at each DFF (RCELL)

Source: RTAX-S/SL RadTolerant FPGAs 2009 Actel.com

Super Cluster:•Combinatorial Cells: C CELLS•DFF Cells: R Cells

Page 15: TMR Schemes Melanie Berg MEI Technologies/NASA GSFC Melanie.D.Berg@NASA.gov.

Page Page 1515European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg

Local Triple Modular Redundancy Local Triple Modular Redundancy (LTMR): (LTMR): Smallest Area & PowerSmallest Area & Power

Triple Each DFF + Vote… Triple Each DFF + Vote…

Data paths are not redundant – can only have one voterData paths are not redundant – can only have one voter

Unprotected:Unprotected:Clocks and Resets… SEFIClocks and Resets… SEFI

Transients (SET->SEU)Transients (SET->SEU)

Internal/hidden device logic: SEFIInternal/hidden device logic: SEFI

SEFISEUSETDFFSEUionConfiguraterror PfsPPPfsP )( SEFISEUSETDFFSEUionConfiguraterror PfsPPPfsP )(Low

Non-Mitigated Mitigated

Page 16: TMR Schemes Melanie Berg MEI Technologies/NASA GSFC Melanie.D.Berg@NASA.gov.

Page Page 1616European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg

ACTEL RTAX-S Embedded ACTEL RTAX-S Embedded Mitigation… LTMR and SETsMitigation… LTMR and SETs

Combinatorial logic: C-CELL

Sequential logic R-CELLCombinatorial logic C-CELL

X

X

X

Super Cluster

C RRX

TX

RX

TX

RX

TX

RX

TX

BC CC R

Combinatorial logic C-CELL

TX

C

C CR

RX

Page 17: TMR Schemes Melanie Berg MEI Technologies/NASA GSFC Melanie.D.Berg@NASA.gov.

Page Page 1717European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg

RTAX Example: Probability of Error RTAX Example: Probability of Error ReductionReduction

Error Probability is Per DFF bit

Error Rate must reflect frequency of operation

SEFISEUSETDFFSEUionConfiguraterror PfsPPPfsP )( SEFISEUSETDFFSEUionConfiguraterror PfsPPPfsP )(Low ~00

Page 18: TMR Schemes Melanie Berg MEI Technologies/NASA GSFC Melanie.D.Berg@NASA.gov.

Page Page 1818European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg

Upper-Bound Error Prediction RHBD Upper-Bound Error Prediction RHBD Anti-fuse FPGAAnti-fuse FPGA

DFF (near) Static Error Bit Rate no CCells DFF (near) Static Error Bit Rate no CCells PPDFFSEUDFFSEU::

15MHz to 120MHz: Dynamic Error Bit Rate with 8 15MHz to 120MHz: Dynamic Error Bit Rate with 8

levels of CCells levels of CCells P(P(fsfs))SET→SEUSET→SEU::

daybit

Errors

dt

dEbit 10101 Source: Actel

daybit

Errors

dt

fsdEbit 89 106101

Source: NASA Goddard

Page 19: TMR Schemes Melanie Berg MEI Technologies/NASA GSFC Melanie.D.Berg@NASA.gov.

Page Page 1919European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg

Upper-Bound Error Prediction Actel Upper-Bound Error Prediction Actel RHBD Anti-fuse FPGARHBD Anti-fuse FPGA

UsedDFFsdt

fsdE

dt

dE bit #*

design

bitsn

daybit

Errorsx *106 8

SEUSETerror fsPfsP )( SEUSETerror fsPfsP )(

With embedded LTMR Mitigation + Hardened Clocks:

daydesign

Errorsx

dt

dE 3103

Thousands of years in LEO !!!!!

Page 20: TMR Schemes Melanie Berg MEI Technologies/NASA GSFC Melanie.D.Berg@NASA.gov.

Page Page 2020European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg

Mitigation and Xilinx Virtex DevicesMitigation and Xilinx Virtex Devices

Page 21: TMR Schemes Melanie Berg MEI Technologies/NASA GSFC Melanie.D.Berg@NASA.gov.

Page Page 2121European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg

Xilinx XQR4VSX55: Radiation Test Xilinx XQR4VSX55: Radiation Test DataData

For non-mitigated designs the most significant upset For non-mitigated designs the most significant upset factor is:factor is:

Xilinx Consortium: VIRTEX-4VQ STATIC SEU CHARACTERIZATION SUMMARY: April/2008

ionConfiguratP

Probability Error Rate LEO GEO

Configuration Memory: XQR4VSX55

Pconfiguration 7.43 4.2

Combined SEFIs per device

PSEFI 7.5x10-5 2.7x10-5

dt

dE ionconfigurat

dt

dESEFI

daydevice

Upsets

daydevice

Upsets

M Berg, Trading ASIC and FPGA Considerations for System Insertion; IEEE Nuclear Science Radiation Effects Conference 2009

Page 22: TMR Schemes Melanie Berg MEI Technologies/NASA GSFC Melanie.D.Berg@NASA.gov.

Page Page 2222European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg

Global Triple Modular Redundancy (GTMR): Global Triple Modular Redundancy (GTMR): Largest Area → Greatest ComplexityLargest Area → Greatest Complexity

Triple Entire DesignTriple Entire Design

Triple I/O and VotersTriple I/O and Voters

Unprotected – hidden device logic SEFIsUnprotected – hidden device logic SEFIs

Can not be an embedded strategy: Complex to verifyCan not be an embedded strategy: Complex to verify

Xilinx offers XTMRXilinx offers XTMR

SEFISEUSETDFFSEUionConfiguraterror PfsPPPfsP )( SEFISEUSETDFFSEUionConfiguraterror PfsPPPfsP )(Low Low

Non-Mitigated Mitigated

Low

Page 23: TMR Schemes Melanie Berg MEI Technologies/NASA GSFC Melanie.D.Berg@NASA.gov.

Page Page 2323European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg

XTMR – Capturing XTMR – Capturing Asynchronous Input dataAsynchronous Input data

INPUT: Async_DATA_tr0

INPUT: Async_DATA_tr1

INPUT: Async_DATA_tr2

n n+1 n+2 n+3

n

n+1

INPUTSKEW

EDGE DETECT TIMING WAVEFORM

Edge_detect_tr0

Edge_detect_tr1

Edge_detect_tr2

n+3 n+4 n+5

Voted rising edge detect

n+2

Q

QSET

CLR

D

Q

QSET

CLR

D

Q

QSET

CLR

D

Q

QSET

CLR

D

E

Edge Detect Circuit

Metastability Filter

Q

QSET

CLR

D

Q

QSET

CLR

D

Q

QSET

CLR

D

Q

QSET

CLR

D

E

Q

QSET

CLR

D

Q

QSET

CLR

D

Q

QSET

CLR

D

Q

QSET

CLR

D

E

VOTER

Async_data_tr0

Async_data_tr1

Async_data_tr2

Dynamic Analysis:

•One domain leads the other two

Page 24: TMR Schemes Melanie Berg MEI Technologies/NASA GSFC Melanie.D.Berg@NASA.gov.

Page Page 2424European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg

Time Domain Considerations: XTMR Time Domain Considerations: XTMR Single Bit Failures …Not Detected by Single Bit Failures …Not Detected by Static Node AnalysisStatic Node Analysis

n n+1 n+2 n+3

n+1

INPUT: Async_DATA_tr0

INPUT: Async_DATA_tr1

INPUT: Async_DATA_tr2

n+2 n+3 n+4 n+5

Voted rising edge detect

Edge_detect_tr0

Edge_detect_tr1

Edge_detect_tr2

CONFIGURATION BIT HIT

NO EDGE DETECTION

Page 25: TMR Schemes Melanie Berg MEI Technologies/NASA GSFC Melanie.D.Berg@NASA.gov.

Page Page 2525European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg

Voters and Asynchronous Signal Voters and Asynchronous Signal CaptureCapture

Q

QSET

CLR

D

Q

QSET

CLR

D

Q

QSET

CLR

D

Q

QSET

CLR

D

E

Edge Detect Circuit

Metastability Filter

Q

QSET

CLR

D

Q

QSET

CLR

D

Q

QSET

CLR

D

Q

QSET

CLR

D

E

Q

QSET

CLR

D

Q

QSET

CLR

D

Q

QSET

CLR

D

Q

QSET

CLR

D

E

VOTER

Q

QSET

CLR

D

Q

QSET

CLR

D

Q

QSET

CLR

D

Q

QSET

CLR

D

E

Edge Detect Circuit

Metastability Filter

Q

QSET

CLR

D

Q

QSET

CLR

D

Q

QSET

CLR

D

Q

QSET

CLR

D

E

Q

QSET

CLR

D

Q

QSET

CLR

D

Q

QSET

CLR

D

Q

QSET

CLR

D

E

VOTER

VOTER

Place voter after Place voter after metastability filtersmetastability filters

It satisfies skew It satisfies skew constraints because constraints because voter is anchored at DFF voter is anchored at DFF control points control points

INPUT: Async_DATA_tr0

INPUT: Async_DATA_tr1

INPUT: Async_DATA_tr2

n+2 n+3 n+4 n+5

n+1

n+1VOTER

Edge Detect

Page 26: TMR Schemes Melanie Berg MEI Technologies/NASA GSFC Melanie.D.Berg@NASA.gov.

Page Page 2626European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg

PPConfigurationConfiguration ??? ???SEUs are insignificantSEUs are insignificant

MBUs may be insignificant (still under investigation)MBUs may be insignificant (still under investigation)

Assumes proper scrubbingAssumes proper scrubbing

Upper-Bound Error Prediction: Upper-Bound Error Prediction: Xilinx FPGA XTMRXilinx FPGA XTMR

day

Errorsn

dt

dE

dt

dE SEFI 5103

DevicendayDevice

Errors

dt

dESEFI

5103

SEFIerror PfsP Assumes Unmitigated SEFIs are the most predominant source:

Page 27: TMR Schemes Melanie Berg MEI Technologies/NASA GSFC Melanie.D.Berg@NASA.gov.

Page Page 2727European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg

ToolsTools

Page 28: TMR Schemes Melanie Berg MEI Technologies/NASA GSFC Melanie.D.Berg@NASA.gov.

Page Page 2828European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg

Mitigation and Actel ToolsMitigation and Actel Tools

Mentor Graphics has offered LTMR for anti-fuse Mentor Graphics has offered LTMR for anti-fuse devicesdevices

There is a desire to employ LTMR to Actel Flash There is a desire to employ LTMR to Actel Flash Based productsBased products

DTMR is another approach (GTMR with no DTMR is another approach (GTMR with no clock redundancy)clock redundancy)

FlashFlash

Assist with SETs in Anti-fuse DeviceAssist with SETs in Anti-fuse Device

Page 29: TMR Schemes Melanie Berg MEI Technologies/NASA GSFC Melanie.D.Berg@NASA.gov.

Page Page 2929European Space Agency FPGA Tool Workshop. Noordwijk, NL; Melanie Berg

Mitigation and Xilinx ToolsMitigation and Xilinx Tools

Currently XTMR is commercially available from Currently XTMR is commercially available from XilinxXilinx

NASA REAG has identified some issues:NASA REAG has identified some issues:Asynchronous domain crossingsAsynchronous domain crossings

Verification of XTMR insertionVerification of XTMR insertion

Mentor is now evaluating GTMR with Formal Mentor is now evaluating GTMR with Formal CheckingChecking

NASA REAG is expecting to use Mentor GTMR NASA REAG is expecting to use Mentor GTMR (preliminary version) for V5 radiation testing(preliminary version) for V5 radiation testing