NATW 2008 Using Implications for Online Error Detection Nuno Alves, Jennifer Dworak, R. Iris Bahar...

18
NATW 2008 Using Implications for Online Error Detection Nuno Alves, Jennifer Dworak, R. Iris Bahar Division of Engineering Brown University Providence, RI 02912 Kundan Nepal Electrical Engineering Dept. Bucknell University Lewisburg, PA 17837

Transcript of NATW 2008 Using Implications for Online Error Detection Nuno Alves, Jennifer Dworak, R. Iris Bahar...

NATW 2008

Using Implications for Online Error Detection

Nuno Alves, Jennifer Dworak, R. Iris Bahar

Division of Engineering Brown University

Providence, RI 02912

Kundan NepalElectrical Engineering Dept.

Bucknell UniversityLewisburg, PA 17837

Online error detection

•Purpose: Detect transient faults that may occur in a circuit during operation

•Critical as circuits scale to smaller sizes

•“Easy” in memory logic

•In circuit logic not so easy

Common online detection techniques

1. Stored pre-computed test vectors in hardware

2. Duplicating the computation of disjoint hardware elements and voting on the result

3. Use of check bits

Our approach

•Find invariant relationships in a circuit

•Violations of these expected relationships can identify errors

Error detection implementation

Invariant relationships in circuits

n5=1 n8=0

n1

n2n3

n4n5

n6n7

n8

These relationships are logic implications

Error detection with implications

n5=1 & n8=1 will generate an error in checker logic

n1

n2n3

n4n5

n6n7

n8

n5=1 n8=0

ERROR

How we find implications

Collect Logic ValuesAt Each Site

Validate Implications

Find Implications

Verilog Description

Logic Simulation

We have implications. Now what?

Select Useful Implications

Remove RedundantImplications

Pick Best ImplicationsFor Given HW Overhead

Why should we remove implications?•With all implications we can generate

checker logic for each implication.

•Inefficient! ▫A circuit can contain thousands of

implications ▫generating separate checker logic for each

implication could more than double circuit size.

•We want to detect only the “most important” implications.

Removing redundant implications

n1

n2n3

n4n5

n6n7

n8n9

n10

n12 n13

n11

i1: n3=0 n8=0

i2: n4=1 n12=0

i3: n4=1 n8=0

i4: n12=0 n8=0

i5: n4=1 n13=0

Removing low coverage implications

•We only want implications that:▫Detect many faults▫Identify hard-to-detect faults▫Cover faults not detected by other

implications

•Finding these important implications requires: ▫fault analysis to determine the specific fault

coverage for each implication

Reducing the number of implications

0%10%20%30%40%50%60%70%80%90%

100%

redundant implications low coverage implications high-quality implications

Case 1

Case2

Case3

Case4

Error Propagates To Output

An Implication is Violated

Covering faults with implications

•For each random input vector, and at each fault, the implications-based circuit operation can fall into the following 4 categories:

Average distribution of the 4 scenarios

0

10

20

30

40

50

60

70%

Case 1: Error Propagated & Implication ViolatedCase 2: Error NOT Propagated & Implication ViolatedCase 3: Error NOT Propagated & Implication NOT ViolatedCase 4: Error Propagated & Implication NOT Violated

How often do we detect errors?Case1/[Case1+Case4]

Implications with fixed HW budgets•Given a fixed HW budget, by how much

can we reduce the probability of an undetected error?

0%2%4%6%8%

10%12%14%16%18%20%

b12 misex2 rd73 Z 5xp1 clip Z 9sym C 499 C 432 C 1908

10% 30% 50%

Conclusions•Practical online error detection alternative

based on implication validation• No modification of targeted logic• Checker logic is added off the critical path

and run in parallel rest of circuit.•For several circuits, we can detect almost

90% of all errors that propagate to a primary output.

•With only a 10% area overhead, probability of an error being both observable and undetected is reduced to 11% on average