408.943.1234 San Jose, California 95134 2655 Seely Avenue ...
Joel Seely Technical Marketing Manager Military & Aerospace Business Unit
-
Upload
charlotte-walton -
Category
Documents
-
view
31 -
download
0
description
Transcript of Joel Seely Technical Marketing Manager Military & Aerospace Business Unit
Maintaining Data Integrity in Maintaining Data Integrity in Programmable Logic in Atmospheric Programmable Logic in Atmospheric
Environments through Environments through Error DetectionError Detection
Joel SeelyTechnical Marketing Manager
Military & Aerospace Business Unit
Single Event Upset (SEU) Single Event Upset (SEU) Overview for SRAM-Based Overview for SRAM-Based
FPGAsFPGAs
Copyright © 2004 Altera Corporation
DefinitionsDefinitions
SEU: Single Event Upset Unwanted Change in State of a Latch or a
Memory Cell SER: Soft Error Rate
SEU Rate SEFI: Single Event Functional Interrupt
Functional Failure by SEU Not All SEUs are SEFIs Generally Takes 5-10 SEUs to Cause SEFI
Copyright © 2004 Altera Corporation
Circuit Components of Circuit Components of SRAM-Based FPGAsSRAM-Based FPGAs I/O Registers & I/O Configuration
No Issue, Very Robust Registers, < 1 FIT
Logic Registers (LEs) No Issues, Very Robust Registers, < Hard Error Rate
User Memory Typically On-Chip Memories are “By 9” for
Parity Checking IP Available for ECC
Configuration RAM (CRAM) for LUTs & Routing Area of Focus
Copyright © 2004 Altera Corporation
Upset of a CRAM CellUpset of a CRAM Cell
Data In
Add
Vcc
Vss
Clear
Data Out
Time
Time
Vo
ltag
e
Vo
ltag
e
6 Transistor Cell
Noise Current for 10fC Collected Charge
0
50
100
150
200
0 50 100 150 200
Time (ps)
Cu
rre
nt
(µA
)
Copyright © 2004 Altera Corporation
SEU Induced Failure RateSEU Induced Failure Rate**
Device LE Count SEU Rate (FIT)
SEFI Rate (FIT)
MTBF**(Years)
EP1C6 6K 250 60 1,900 Years
EP1C20 20K 730 180 634 Years
EP1S25 26K 1950 400 285 Years
EP1S80 79K 6000 1200 95 Years
* Data at Sea Level
**MTBF: Mean Time Between Functional Interrupt
Copyright © 2004 Altera Corporation
Altera EP1S25 Neutron SER - WNR data
0.5%1%
16%20%30%40%50%60%70%84%90%
99%99.5%
-3-2.5
-2-1.5
-1-0.5
00.5
11.5
22.5
3
0 10 20 30 40 50
# of CRAM bit upsets for each event of functional upset
Std D
eviat
ion
Number of CRAM Bit Upsets for Each Number of CRAM Bit Upsets for Each Occurrence of Functional UpsetOccurrence of Functional Upset
Altera EP1S25 Alpha SER
0.5%1%
16%20%30%40%50%60%70%
84%90%
99%99.5%
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
0 10 20 30 40 50
# of CRAM bit upsets for each event of functional upset
Std
De
via
tio
n
Median ~6Median ~6 Median 5Median 5
AddressingAddressingSystem-Level IssuesSystem-Level Issues
Copyright © 2004 Altera Corporation
SER Improvements/MitigationSER Improvements/Mitigation
Chip Design Enhancements New Materials & Process Enhancements Larger CRAM Structure Increase in Capacitance on Critical Node Smaller Process => Smaller Die => Lower
SEU Probability Built-In Error Detection/Correction Circuitry
Copyright © 2004 Altera Corporation
SER Per SRAM Bit TrendSER Per SRAM Bit Trend
Process Technology Year
0.5 µm1995
0.13 µm2002
SE
R p
er S
RA
M M
Bit
100 FITS
1,000 FITS
90 nm Projection
Copyright © 2004 Altera Corporation
System Level Improvements System Level Improvements MitigationMitigation
ECC for User Memory Use Detection/Correction Feature Triple Module Redundancy (TMR)
To Achieve Lower Error Rate & Less Downtime
Migrate to Structured ASIC
Copyright © 2004 Altera Corporation
Soft Error Detection MethodsSoft Error Detection Methods
Configuration RAM Readout Read-Out Full Bitstream Compare with Stored Bitstream Can Determine where in Configuration Error Occurred
Caveat: Security Issues with Reading Out Bitstream
StoredCRAMData
StoredCRAMData
FPGAFPGAMicroprocessor
or CPLD
Microprocessoror
CPLD
Same or Different?
Copyright © 2004 Altera Corporation
Soft Error Detection MethodsSoft Error Detection Methods
On-Chip SEU Detection Dedicated Comparison Circuitry
e.g. CRC Engine Comparing Stored CRC with That Calculated from Configuration RAM
Detection Circuitry Running Continuously Error Detection Rate Variable Based on Implementation of
Hardware, Number of CRAM Bits & Input Clock Frequency Error Signal Available Internally or ExternallyCaveat: Cannot Determine Where in Configuration Error Occurred
Computed Value
Stored Value
To Core
=
FPGA
Copyright © 2004 Altera Corporation
On-Chip Detection ExampleOn-Chip Detection Example
Dedicated CRC Circuit Configuration RAM Verification Capability
32-Bit Cyclic Redundancy Code Check Verified Against Internally Stored Value Runs in the Background Without Impacting
Device Performance Close to Real-Time Detection
Variable Clock Frequency Depends on Number of CRAM Bits
Multi-Event Detection Up to 3-Bit for 32-Bit CRC
Result Output to Either Core or Pin Use with Either Internal or External Hardware for
Error Correction
Copyright © 2004 Altera Corporation
Correction MethodsCorrection Methods
FPGA Detection, System-Level Correction Lower Total Cost Downtime Is Limited & Manageable Used in Non-Critical Applications
Triple Module Redundancy Two Flavors
All On-Chip in FPGA Separate Chips & Voter
Correction Can Be Real-Time Used in Critical Applications
Copyright © 2004 Altera Corporation
Single System Detection & CorrectionSingle System Detection & Correction
Step One: Detect the Soft Error 75% of Reported Errors Are “Don’t Care” Errors
Step Two: Alert the System Step Three: Fix the Error
In Some Cases, Re-Program the FPGA In Some Cases, Reboot the Sub-System In Some Cases, Reboot the System
Need to Focus on System “Downtime” Each System Has Unique Requirements Re-Programming FPGA Takes < 250 ms Rebooting Time Varies & Can Be Fast “by Design”
Copyright © 2004 Altera Corporation
TMR Method 1TMR Method 1
Identical Hardware in
FPGAs
Use Voter Implemented
in FPGA or CPLD
Utilize Either Hardware
Output or CRC Error Pin
Voter Also Used to
Signal Reconfiguration
on Difference or Error
FPGAHardware1
FPGAHardware1
FPGAHardware3
FPGAHardware3
FPGAHardware 2
FPGAHardware 2
FPGA orCPLD
(Voting)
FPGA orCPLD
(Voting)
Copyright © 2004 Altera Corporation
TMR Method 2TMR Method 2
Multiple Instantiations of
Hardware in Single FPGA
For Low-Rate SEUs
SEU Events May Occur Much
More Frequently than
Functional Error (De-Rating)
Voter Signals Reconfiguration
of FPGA
FPGA Must be Reconfigured
VotingCircuitVotingCircuit
FPGA
Hardware 1
Hardware 1
Hardware 2
Hardware 2
Hardware 3
Hardware 3
Copyright © 2004 Altera Corporation
De-RatingDe-Rating Methodology Methodology Only a Fraction of Configuration Bits Are Actually
Programmed e.g. Using Only Two Inputs of 4-Input LUT Leaves 75% of LUT as
“Don’t Care” Only About 20% of Routing Is Used Depends on Utilization & Application
Some Un-Programmed Bits Still Matter Flipping Could Change Function of the Device
Extensive Experimentation Shows a Range From 1/8 to 1/3 of the Bits Matter
Copyright © 2004 Altera Corporation
Structured ASIC: Ultimate SEU Structured ASIC: Ultimate SEU ProtectionProtection
No Configuration Memory = Estimated SER is below Hard Failure Rate for the Device
FPGA Structured ASIC
PLD Architecture with ASIC Routing
Copyright © 2004 Altera Corporation
SummarySummary SEU is a Well Understood Phenomena Many Chip Level Enhancements Mitigate SEUs
Process Design Manufacturing Techniques
Easy Detection of SEU Events is Key After Detection, Other Methods Must be Employed to Deal
with the Event Critical Nature of Application Determines Level of SEU Response
Structured ASICs from FPGA Designs Offer a Much More Robust Solution Due to Removal of All CRAM