Sana Rezgui 1, Jeffrey George 2, Gary Swift 3, Kevin Somervill 4, Carl Carmichael 1 and Gregory...

19
Sana Rezgui 1 , Jeffrey George 2 , Gary Swift 3 , Kevin Somervill 4 , Carl Carmichael 1 and Gregory Allen 3 , SEU Mitigation of a Soft Embedded Processor in the Virtex-II FPGAs 1 Xilinx, Inc., San Jose, CA 2 The Aerospace Corporation, El Segundo, CA 3 Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA 4 NASA Langley, Hampton, VA For the North American Xilinx Test Consortium

Transcript of Sana Rezgui 1, Jeffrey George 2, Gary Swift 3, Kevin Somervill 4, Carl Carmichael 1 and Gregory...

Sana Rezgui1, Jeffrey George2, Gary Swift3, Kevin Somervill4, Carl Carmichael1 and Gregory Allen3,

SEU Mitigation of a Soft Embedded Processor in the Virtex-II FPGAs

1Xilinx, Inc., San Jose, CA

2The Aerospace Corporation, El Segundo, CA

3Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA

4NASA Langley, Hampton, VA

For the North American Xilinx Test Consortium

Rezgui 2 MAPLD 2005/E238

Objective• Use of embedded system applications built on S-FPGAs in radiation environment => Mitigation to SEUs and Design Implementation

• Mitigated Design Performances― Simplicity, flexibility and automation― Area and timing performances

• Upset Sensitivity in Radiation Environment― Characterization of the FPGA sensitivity in beam― Evaluation of the proposed mitigation solution for the embedded design

Measure the in-beam performance of upset mitigation technique applied to a complex design - a processor- implemented on FPGA

running a computationally intensive benchmark program

Rezgui 3 MAPLD 2005/E238

Studied Case

Mitigation to SEUs of the Xilinx soft IP processor MicroBlaze by means of the Triple Modular Redundancy (TMR) technique

Configuration

Logic Block

(CLB)

Block RAM

18 bit Multipliers

Programmable I/OsDigital Clock Manager

MicroBlaze

Rezgui 4 MAPLD 2005/E238

Internal Architecture

MicroBlaze is a 32-bit Harvard Bus RISC Architecture

Rezgui 5 MAPLD 2005/E238

MicroBlaze Mitigation

1. Use TMR technique to mitigate the design to SEUs• MicroBlaze designs consist of I/Os, Look-Up Tables (LUT), Flip-

Flops (FF) and user memory elements,• For TMR Tool (developed by Xilinx), MicroBlaze is no different

than any other design.

2. Run Active Readback and Continuous Scrubbing of all the static used resources for error detection and correction

• This is transparent and independent to/from the running design,• User memory elements can not be scrubbed from the

configuration port.

Rezgui 6 MAPLD 2005/E238

Internal Architecture

User memory elements: SRL16s, Distributed Memory (LUT-RAM), BRAMs• Active Readback causes problems with user memory elements (dynamic content)• BRAM static partial reconfiguration is not possible if storing program data in addition to the code

LUT-RAMs

SRL16s

SRL16s

BRAM BRAM

Rezgui 7 MAPLD 2005/E238

User Memory Mitigation

• Error Detection and Correction (EDAC)― Additional decoding logic would be required― Depends on the speed of detection and correction of upsets

• Replacement of the user memory elements by FFs and LUTs― SRL16 are automatically replaced by FFs and LUTs by the TMR Tool ― Distributed RAM (LUT-RAM) are not set to be automatically replaced:

A custom macro is then required for their replacement by FFs and LUTs

• Triple Modular Redundancy and Self-Correction of the BRAMs― Done automatically through the TMR Tool by replacing each BRAM by

a custom macro that scrubs the BRAM itself

• EDAC and TMR can be defeated by error accumulation

Rezgui 8 MAPLD 2005/E238

BRAM Mitigation Methodology

1. Apply TMR on the used BRAMs2. Insert an internal scrub controller of the

3 BRAMs by their voted output value• Mitigation Requirement: Only one

BRAM port could be used for the MicroBlaze design

• Each Block RAM is replaced with the tmred BRAMs and the internal BRAM scrubber controller

Rezgui 9 MAPLD 2005/E238

EDK / TMR Tool Design Flow

System Design

Implementation TMR Tool

NGDBuild

MAP

PAR

BitGen / BitInit

Design EntryEDK/ISE

XTMR ConversionTMR Tool

ImplementationISE

.ngc

.bmm

.elf

.edf

(Manual edit).ucf

.ngo

LUTRAM & BRAM Macro Replacement

Rezgui 10 MAPLD 2005/E238

Implementation and Performance (1)Virtex II- 6000 Used Internal Resources

0

10

20

30

40

50

60

70

80

90

100

Design Type

%V

irte

x I

I 6

00

0 U

se

d R

es

ou

rce

s

FFs

LUTsGCLK

IOsMULTs

BRAMs

Sing

le S

tring

Micr

oBla

ze

Mitig

ated

Mbl

aze

desig

n wi

th L

UT-R

AM

Mitig

ated

Mbl

aze

desig

n wi

thou

t LUT

-RAM

s

Full M

itigat

ed D

esig

n

Rezgui 11 MAPLD 2005/E238

Implementation and Performance (2)Timing Performances and Core Voltage Current Consumption

Tested Design Maximum Frequency

(MHz)

Current Consumption

(A)

Single-string Mblaze (Phase 1) 77 0.37

Mitigated Mblaze design before Replacement of LUT-RAM (Phase 2)

66 0.78

Mitigated Mblaze design after Replacement of LUT-RAM (Phase 3)

66 0.83

Full Mitigated Design (Phase 4) 66 0.99

Rezgui 12 MAPLD 2005/E238

Experimental Test Designs

Service FPGA: XC2V3000

1. Configuration Monitor• DUT Configuration• Continuous alternate scrubbing and

readback at a rate of 4 per second• SEFI Detection

2. Functional Monitor• Sends input vectors to DUT• Detects Errors based on the DUT outputs• Records errors and exception occurrence• Runs continuous handshaking with the

DUT to assure its full synchronization with external peripherals

DUT FPGAXQR2V6000

MicroBlaze design running• Integer-based FFT software• 33MHz MicroBlaze clock speed• 0.25 MHz GPIO Bus

Two mitigated design versions:

1. Without BRAM Scrubber

2. With BRAM Scrubber

Rezgui 13 MAPLD 2005/E238

DUT/Service FPGAs Communication

Majority VoterMajority Voter

DUTXQR2V6000

Data_In_TR016 Bits

16 Bits

Service FPGA XC2V3000

Data_In_TR0

Data_Out

Functional Interface BU

SFunctional Interface B

US

GPIO

BU

SG

PIO B

US

Data_Out_TR016 Bits

16 Bits

16 Bits

16 Bits

Data_Out_TR1

Data_Out_TR2

Clk-TR0Clk-TR1Clk-TR2

Rst-TR0Rst-TR1Rst-TR2

DVld-Out-TR0DVld-Out-TR1DVld-Out-TR2

DVld-Out-TR0DVld-Out-TR1DVld-Out-TR2

DVld-In-TR0DVld-In-TR1DVld-In-TR2

DVld-Exc-Out-TR0DVld-Exc-Out-TR1DVld-Exc-Out-TR2

DVld-Exc-In-TR0DVld-Exc-In-TR1DVld-Exc-In-TR2

DVld-Exc-Out-TR0DVld-Exc-Out-TR1DVld-Exc-Out-TR2

Data_In_TR1

Data_In_TR2

Data_In_TR1

Data_In_TR2

Majority VoterMajority VoterDVld-In

DVld-Exc-In

TMRed MicroBlazeTMRed MicroBlaze

Functional Monitor

Functional Monitor

DUT Configuration Monitor

- Configuration- Readback (SEU Counting)- Scrubbing- SEFI Detection

SelectMap PortSelectMap Port

Handshaking

Exception Detection

Data Transfer

Rezgui 14 MAPLD 2005/E238

Experimental Setup

Tested at Crocker Nuclear Laboratory at UC Davis using 63.3MeV Proton Beam

DUTService FPGA

Rezgui 15 MAPLD 2005/E238

Proton Beam Results (1)

• Error Classification― Type 1: FFT program calculates an incorrect result― Type 2: MicroBlaze communication sequence is wrong or stops (timeout)― Type 3: An exception or interrupt is invoked

• Error Recovery Types― The MicroBlaze recovers the next iteration of the program― The MicroBlaze recovers when the processor was reset― The MicroBlaze recovers after scrubbing the FPGA logic

• Non-Recovery Types (Type -R)― Runaway Resets: Upsets in the MicroBlaze code (stored in the BRAM) in at

least two domains― Runaway Exceptions: Illegal operation on the MicroBlaze detected by the

exception Handler (DUT/Service) ― Runaway Errors: Illegal code in the FFT computation code

Rezgui 16 MAPLD 2005/E238

Proton-Induced Cross Sections of the Design 1 at Various Fluxes

Flux

[p/cm2/s]

CLB Upsets / Scrub Cycle

Fluence

[p/cm2]

Type 1 Error Cross-Section

[cm2]

Type 1R Error Cross-Section

[cm2]

Type 2 Error Cross-Section

[cm2]

Type 2R Error Cross-Section

[cm2]

Type 3 Error Cross-Section

[cm2]

(1) 1.70 x107 2 to 7 1.00 x1011 7.00x10-11 <1.00x10-11 5.00x10-11 <1.00x10-11 <1.00x10-11

(2) 1.70 x108 15 to 30 1.03 x1011 2.92x10-10 9.74x10-12 2.05x10-10 6.82x10-11 <9.70x10-12

(3) 1.70 x109 150 to 190 4.86 x1010 1.07x10-9 <2.05x10-11 7.82x10-10 1.65x10-10 3.60x10-11

Flux

[p/cm2/s]

CLB Upsets /

Scrub Cycle

Fluence

[p/cm2]

Type 1 Error Cross-Section

[cm2]

Type 1R Error Cross-Section

[cm2]

Type 2 Error Cross-Section

[cm2]

Type 2R Error Cross-Section

[cm2]

Type 3 Error Cross-Section

[cm2]

(1) 1.94 x107 2 to 7 9.79 x1010 7.56 x 10-10 2.04 x 10-11 6.34 x 10-10 1.43 x 10-10 8.17 x 10-11

(2) 3.87 x107 4 to 15 2.49 x1010 8.44 x 10-10 < 4.02 x 10-11 6.03 x 10-10 2.01 x 10-10 1.61 x10-10

Proton-Induced Cross Sections of the Design 2 at Various Fluxes

Proton Beam Results (2)

Rezgui 17 MAPLD 2005/E238

Conclusion

• A complete solution to mitigate an embedded processor

implemented on a Xilinx Virtex II FPGA based on:

― Continuous external configuration scrubbing,

― Functional-block design triplication,

― Independent internal BRAM scrubbing (also triplicated).

• A high area and power dissipation penalties after replacement

of the distributed RAMs

• At Low flux: Very low error cross-section (1.2x10-10 cm2)

• The error cross-section increase rapidly with increasing flux

• For space environment, it is predicted that the error rate of a

MicroBlaze design should be lower than a SEFI rate, which

prove the high efficacy of this solution

Rezgui 18 MAPLD 2005/E238

Learned Lessons

• Check if your design includes SRL16s or distributed RAMs to allow active scrubbing

• Do the SMOKE test: Break one domain and insure that the design is still running

• Reduce the flux to respect the first rule of TMR mitigation technique (1 upset / scrub cycle)

Rezgui 19 MAPLD 2005/E238

References1. Lima, F., Carmichael, C., Fabula, J., Padovani, R. and Reis, R., "A Fault Injection

Analysis of Virtex® FPGA TMR Design Methodology", RADECS’01, September 2001.

2. Lima (de) F., Rezgui S., Cota E.F., Lubaszewski M. and Velazco R., “Designing and testing a radiation hardened 8051-like micro-controller”, MAPLD’00, Laurel, Maryland, September 2000.

3. Swift G., Rezgui S., George J., Carmichael C., Napier M., Maksymowicz J., Moore J., Lesea A., Koga R. and Wrobel T., “Dynamic Testing of Xilinx Virtex-II Field Programmable Gate Array’s (FPGA’s) Input Output Blocks (IOBs)”, NSREC’04, July 2004.

4. Carmichael C., Bridgford B. and Moore J., “Triple Module Redundancy Scheme for Static Latch-Based FPGAs”, MAPLD 2004, Laurel, Maryland, September 2004.

5. Carmichael C., “Triple Module Redundancy Design Techniques for Virtex FPGAs”, http://www.xilinx.com/bvdocs/appnotes/xapp197.pdf, Xilinx Application Note XAPP197, November 2001.

6. MicroBlaze Processor Reference User Guide, Embedded Development Kit (EDK 6.3), UG081, Version 4.0, Xilinx Inc., August 2004.

7. Roberts T., Slaney M., FFT C Code available at http://www.jjj.de/fft/int_fft.c, December 1994.

8. TMR Tool User Guide, UG156, Version 6.2.3, http://support.xilinx.com/products/milaero/ug156.pdf, Xilinx Inc., September 2004.

9. Xilin Application Note 197, “Triple Module Redundancy Design Techniques for Virtex FPGAs”, November 2001.