Self-Checking Fault Detection using Discrepancy Mirrors

22
Ronald F. DeMara, Carthik A. Sharma University of Central Florida Self-Checking Fault Detection Self-Checking Fault Detection using Discrepancy Mirrors PDPTA 2005 PDPTA 2005 Las Vegas Las Vegas

description

Self-Checking Fault Detection using Discrepancy Mirrors. PDPTA 2005 Las Vegas. Ronald F. DeMara, Carthik A. Sharma University of Central Florida. Fault Handling Overview. Failure Manifestation of a fault Deviation from expected behavior Detection Identify occurrence of fault - PowerPoint PPT Presentation

Transcript of Self-Checking Fault Detection using Discrepancy Mirrors

Page 1: Self-Checking Fault Detection  using Discrepancy Mirrors

Ronald F. DeMara, Carthik A. SharmaUniversity of Central Florida

Ronald F. DeMara, Carthik A. SharmaUniversity of Central Florida

Self-Checking Fault Detection Self-Checking Fault Detection using

Discrepancy Mirrors

PDPTA 2005PDPTA 2005 Las Vegas Las Vegas

PDPTA 2005PDPTA 2005 Las Vegas Las Vegas

Page 2: Self-Checking Fault Detection  using Discrepancy Mirrors

Fault Handling Overview

• FailureFailure Manifestation of a fault Deviation from expected behavior

• DetectionDetection Identify occurrence of fault

Fully articulating inputs Intermittently articulating inputs

Methods Coding based schemes Redundancy

• IsolationIsolation Physical location of fault PCI-based card used for Xilinx

Virtex II-Pro Based Autonomous Repair Testbed

Page 3: Self-Checking Fault Detection  using Discrepancy Mirrors

Ideal Detection Characteristics

• Faults in the detector are covered by itselfFaults in the detector are covered by itself Fault-secure Self-testing No “Golden Elements”

• Multiple types of faults handled by same detectorMultiple types of faults handled by same detector Transient and Permanent faults Logic and Interconnect faults

• Minimum number of false-positivesMinimum number of false-positives Accuracy and reliability

• Minimal power consumptionMinimal power consumption

• Verifiable correctnessVerifiable correctness

• Practical AssessmentPractical Assessment Fitness assessment should be tractable

Page 4: Self-Checking Fault Detection  using Discrepancy Mirrors

Discrepancy Mirror

Fault CoverageFault Coverage

• Mechanism for Checking-the-Checker (“golden element” problem)

• Makes checker part of configuration that competes for correctness [DeMara PDPTA-05]

Page 5: Self-Checking Fault Detection  using Discrepancy Mirrors

Discrepancy Mirror Circuit

Fault CoverageFault CoverageComponent Fault Scenarios Fault-Free

Function Output A Fault Correct Correct Correct Correct

Function Output B Correct Fault Correct Correct Correct

XNORA Disagree (0) Disagree (0) Fault : Disagree(0) Agree (1) Agree (1)

XNORB Disagree (0) Disagree (0) Agree (1) Fault : Disagree(0) Agree (1)

BufferA 0 0 High-Z 0 1

BufferB 0 0 0 High-Z 1

Match Output 0 0 0 0 1

Page 6: Self-Checking Fault Detection  using Discrepancy Mirrors

Discrepancy Mirror Truth Table

A B XNORA XNORB ENBA ENBB TRIA TRIB MATCH

0 0 1 1 1 1 1 1 1

0 1 0 0 0 0 0 0 0

1 0 0 0 0 0 0 0 0

1 1 1 1 1 1 1 1 1

• Discrepancy Mirror Truth Table ensures complete coverage of detector.

• Single Point of Failure reduced to a stuck-at fault exposure for MATCH output (Wired-Or)

Page 7: Self-Checking Fault Detection  using Discrepancy Mirrors

Discrepancy-Enabled Isolation

Page 8: Self-Checking Fault Detection  using Discrepancy Mirrors

Discrepancy Mirror Approach

• Selection PhaseSelection Phase Two candidates chosen from population Use mutually exclusive resources Carry out computation in tandem

• Detection PhaseDetection Phase Discrepancy Mirror compares outputs MATCH output signifies fault free configurations Faults in the detector also covered

• Preference Adjustment ProcessPreference Adjustment Process Detector output over time indicates relative fitness Relative fitness can be used to choose candidates

Page 9: Self-Checking Fault Detection  using Discrepancy Mirrors

CRR Arrangement in SRAM FPGA

Configurations in PopulationConfigurations in Population• C = CL CR

• CL = subset of left-half configurations• CR = subset of right-half configurations• |CL|=|CR |= |C|/2

Discrepancy OperatorDiscrepancy Operator• Baseline Discrepancy Operator is dyadic operator with binary output:

• Z(Ci) is FPGA data throughput output of configuration Ci

• Each half-configuration evaluates using embedded checker (XNOR gate) within each individual

• Any fault in checker lowers that individual’s fitness so that individual is no longer preferred and eventually undergoes repair

Othewise

CZCZCC

Ri

LiR

iLi

)()(

1

0

Reconfiguration Algorithm

`

SR A M-based FPGA

LHalf-Configuration

Discrepancy Check L Discrepancy Check R

Function Logic L

CONFIGURATION BIT STREAM

INPUT DATA

Function Logic R

DATA OUTPUT

FEE

DB

AC

K

RHalf-Configuration

CONTROL

OFF

-CH

IP E

EPR

OM

( NO

TE: a

non

-vol

atile

mem

ory

is a

lread

y re

quire

d to

boo

t any

SR

AMFP

GA

from

col

d st

art .

.. th

is is

not

an

addi

tiona

l chi

p )

Rji

Ljii CEORC ,,j =RS:

(Hamming Distance)

Rji

Ljii CEORC ,,j ^ =WTA:

(Equivalence)

Page 10: Self-Checking Fault Detection  using Discrepancy Mirrors

Overview of FPGA operation

Competing ConfigurationsCompeting Configurations• Configurations A and B are physically distinct• CA = subset consisting of ‘A’ configurations• CB = subset consisting of ‘B’ configurations• |CA|=|CB |= |C|/2

Discrepancy OperatorDiscrepancy Operator• Baseline Discrepancy Operator is dyadic operator with binary output:

• Z(Ci) is FPGA data throughput output of configuration Ci

• Each half-configuration evaluates using embedded checker (XNOR gate) within each individual

• Any fault in checker or functional logic lowers fitness of resources used by that individual leading to isolation

Otherwise

CZCZCC

Bi

AiB

iA

i

)()(

1

0

Reconfiguration Algorithm

`

SRAM-based FPGA

Configuration A

Discrepancy Mirror A Discrepancy Mirror B

Function Logic A

CONFIGURATION BIT STREAM

INPUT DATA

Function Logic B

DATA OUTPUT

FE

ED

BA

CK

Configuration B

CONTROL

OF

F-C

HIP

EE

PR

OM

( N

OT

E:

a no

n-vo

latil

e m

emor

y is

alre

ady

requ

ired

to b

oot

any

SR

AM

FP

GA

fro

m c

old

star

t ..

. th

is is

not

an

addi

tiona

l chi

p )

Page 11: Self-Checking Fault Detection  using Discrepancy Mirrors

Discrepancy Mirror Schematic:CMOS

Pspice SchematicPspice Schematic

• 44 p- and n-channel MOS Transistors

• 1.5 micron minimum width

• 600 nm length

• Width of p-mos transistors = 3*width of n-mos trans.

Page 12: Self-Checking Fault Detection  using Discrepancy Mirrors

Discrepancy Mirror Schematic:Xilinx

Xilinx SchematicXilinx Schematic

• Virtex-II Pro FPGA

• ModelSim-II Simulator

• Emulated (digital) Pull-down Resistor

Page 13: Self-Checking Fault Detection  using Discrepancy Mirrors

Discrepancy Mirror Simulation:CMOS Circuit

Transient ResponseTransient Response

• Behavior conforms to specifications

• Correct identification of Discrepancy

Page 14: Self-Checking Fault Detection  using Discrepancy Mirrors

Discrepancy Mirror Simulation:Xilinx ModelSim-II

Circuit ResponseCircuit Response

Output ‘High’ == 1 when input q1 == q2

Output ‘Low’ when input q1 != q2. In Xilinx FPGAs, ‘Low’ is not exactly equal to zero, but is a Logic ‘zero’ nevertheless.

Page 15: Self-Checking Fault Detection  using Discrepancy Mirrors

Fault Location Experiments

• Two experiments conductedTwo experiments conducted C-language program simulator Locate fault by successive intersections

v-subsets or groups of resources Fault identified after m comparisons – what is the value of m?

Identify number of iterations required to identify single-fault Random inputs, Single stuck-at fault Expected number of pairings over 100 simulations One ‘resource’ equivalent to one CLB ( > 10 gates)

• Experiment 1Experiment 1 Perpetually articulating inputs

• Experiment 2Experiment 2 Intermittently articulating inputs

Page 16: Self-Checking Fault Detection  using Discrepancy Mirrors

Fault Location Using Dueling

Let UU denote the set of all logic resources on the FPGASS denote the pool of resources suspected of being faultyInitially denotes the set of resources used by ith configuration.

To isolate the fault, m successive intersections,

are performed at the end of which |S| = 1

With pre-designed partitions to achieve maximal isolation• Isolation can be completed in 2n iterations, where n = | |

|||| US

UCi

),( mkjiCC jkj

iC

Page 17: Self-Checking Fault Detection  using Discrepancy Mirrors

Analysis with Perpetually Articulating Inputs

Perpetually Articulating Perpetually Articulating InputsInputs• No observed discrepancy implies fault-free resources

Best Case (50% Utilized Capacity):• 11.1 pairings for 1,000 resources• 17.6 pairings for 100,000 resources

Most Demanding Case:63.7 pairings for 100,000 resources with 5% capacity utilization.

Page 18: Self-Checking Fault Detection  using Discrepancy Mirrors

Analysis with Intermittently Articulating Inputs

Intermittently Articulating Intermittently Articulating InputsInputs• Inputs may be such that fault is not articulated at the outputs• No observed discrepancy does not imply fault-free resources • Only discrepant outputs provide fault-location information

Best Case (45% Utilized Capacity):• 42 pairings for 1,000 resources• 64.1 pairings for 100,000 resources

Most Demanding Case:478 pairings for 100,000 resources with 95% capacity utilization.50% of the inputs articulate the fault

Page 19: Self-Checking Fault Detection  using Discrepancy Mirrors

Experimental Results Summary

• Number of iterations to detect faults depends on Number of iterations to detect faults depends on Utilized CapacityUtilized Capacity Designs that utilize only a very few resources ( < 20%), or

almost all ( > 80%) the resources on the FPGA pose difficult isolation problems

Each intersection exonerates (implicates) fewer individual resources

• Method scales wellMethod scales well 11.1, 14.9, 17.6 pairings required for 1,000, 10,000, and

100,000 resources. Sub-linear increase in location time. • Current WorkCurrent Work

Competitive Runtime Reconfiguration (CRR) framework under development which will utilize methods outlined

Investigation of Competitive Group Testing methods to enable faster fault isolation

Analysis of characteristics of isolation, dependency on parameters, optimal partitioning methods.

Page 20: Self-Checking Fault Detection  using Discrepancy Mirrors

Backup Slides Follow

Page 21: Self-Checking Fault Detection  using Discrepancy Mirrors

Accommodating Multi-bit Word Widths

• Proof of conceptProof of concept The present circuit works efficiently Demonstrates important Dueling-enabled isolation method

• StrategiesStrategies Use an array of detectors

attempt to minimize points of failure as word-width increases Number of logic resources used is acceptable for smaller

circuits Create new circuit or scheme, combining fault tolerant

coding-based methods with single-fault secure circuit Current research focused on improving detector by

investigating codes, and fault-secure circuits

Page 22: Self-Checking Fault Detection  using Discrepancy Mirrors

Pull-down Resistor Considerations

• Proof of conceptProof of concept The present circuit works in a verifiable correct manner Can utilize synthesized (digital) pull-down resistor which

simulate the behavior of analog resistors Demonstrates Dueling-enabled isolation method Can be utilized without implementation problems for

Custom-VLSI designs

• Alternative ApproachAlternative Approach Alternate detector circuits for FPGA implementation are Alternate detector circuits for FPGA implementation are

under investigationunder investigation Avoid using Tri-state buffers, pull-down resistors and use Avoid using Tri-state buffers, pull-down resistors and use

native digital components available on FPGAsnative digital components available on FPGAs