Online and Operand-Aware Detection of Failures Utilizing False Alarm Vectors Amir Yazdanbakhsh,...
-
Upload
may-murphy -
Category
Documents
-
view
216 -
download
0
Transcript of Online and Operand-Aware Detection of Failures Utilizing False Alarm Vectors Amir Yazdanbakhsh,...
Online and Operand-Aware Detection of Failures Utilizing False Alarm Vectors
Amir Yazdanbakhsh, David Palframan, Azadeh Davoodi, Mikko Lipasti, Nam Sung Kim
Department of Electrical and Computer Engineering
2
Introduction• Technology scaling beyond 32nm degrades the
manufacturing yieldo Can be addressed by imposing restrictive design rules or just
using regular fabrics[T. Jhaveri, SPIE’06]
o Can be addressed by using configurable logic blocks to make post-silicon corrections[Y. Ran, TVLSI’06]
o Redundancy based techniques can also be usedo Exploiting existing redundancy in high performance processors
[P. Shivakumar, ICCD’12][S. Shyam, ASPLOS’06][J. Srinivasan, ISCA’05]o Incorporate redundancy at the granularity of a bit slice
[K. Namba, PRDC’05]
3
Motivation
C7 C4
01234567
Defective prefix nodeimpacts C4 and C7
0000…11010…0110
Minimized vectors
Recover
Checker
• The checker detects a match with the faulty vectors and a small number of False Alarm vectors at runtime.
4
Contributions
Checker Unit Module
False Alarm Vectors
Flexible option for online and operand-level fault detection Update faulty vectors over the time TCAM-based implementation which can store cubes with
don’t care No extra logic on the critical paths
Efficient use of false alarm vectors to reduce the number ofvectors to be checked, thus reducing the TCAM area
Integrate the false alarm insertion into ESPRESSO 2-level logicminimization tool
The recovery flag is not falsely activated too frequently
5
Checker Unit: Comparison with A Redundancy-based Alternative• Checker Unit • Redundancy-based
Recover
0xxx11x0
TCAM
Operand Checker
Does not affect the critical path
Flexible checker unit (can update faulty vectors)
Online and operand-aware detection of failures
⤫ Affects the critical path (large muxes)
⤫ Fixed design approach (can not be updated)
⤫ Two out of three adders should always be fault-free
6
Overview of TCAM• TCAM can store test cubes which have don’t care bits• Conventional TCAM needs to support random access to a
specific entry to update the key value at runtime– Requires a log N-to-N decoder for a TCAM with N entries– The checker unit does not need such a decoder
• Each entry must be updated only once, every time the chip is turned on• Supporting a sequential access to write the test cubes to the TCAM is
sufficient
• In our framework the size of TCAM can get impractically large if all the faulty are individually stored– We propose to a few false alarm vectors to reduce number of
entries in the TCAM and therefore reduce the TCAM size
7
False Alarm Insertion to Minimize the TCAM Size: Example
A
B
CD
E
A B C D E
x 1 0 x x0 1 1 0 x
1 1 1 0 0
1 1 1 0 1
x 0 0 x 1
x 0 1 0 1
V1
V2
V3
V4
V5
V6
Identify cubes which excite fault
8
A B C D E
x 1 0 x x0 1 1 0 x
1 1 1 0 0
1 1 1 0 1
x 0 0 x 1
x 0 1 0 1
V1
V2
V3
V4
V5
V6
A B C D E
x x 0 x 1x 1 0 x x
x x x 0 1
x 1 x 0 x
V1
V2
V3
V4
Identify cubes which excite fault
Test cube minimization
False Alarm Insertion to Minimize the TCAM Size: Example
9
A B C D E
x x 0 x 1x 1 0 x x
x x x 0 x
V1
V2
V3
A B C D E
x x 0 x 1x 1 0 x x
x x x 0 1
x 1 x 0 x
V1
V2
V3
V4
• We reduce the number of test cubes from 6 to 3
Identify cubes which excite fault
Test cube minimization
Further minimization with
False Alarm Insertion
False Alarm Insertion to Minimize the TCAM Size: Example
10
False Alarm Insertion
Problem Definition• Reduce the number of cubes beneath the given
threshold by adding as few false alarm vectors as possible
Why we need False Alarms?• Due to area budget, number of entries in TCAM is
limited• The number of test cubes translates to the number of
entries in the TCAM
11
Using Two-Level Logic Minimization
• Two-level logic minimization can be used to minimize the number of test cubes
• We expand the ESPRESSO* tool by inserting false alarm vectors to achieve higher minimization
*ESPRESSO. http://embedded.eecs.berkeley.edu/pubs/downloads/espresso/.
12
False Alarm Insertion by Extending ESPRESSO
F = IRREDUNDANT (FON, FDC)F = REDUCE (FON, FDC)F = EXPAND (FON, FOFF)
Stop Minimization
?
Test cubes
Minimized test cubes
Overview of the main loop of ESPRESSO
F = EXPAND-FA (FON, FOFF)F = IRREDUNDANT (FON, FDC)F = REDUCE (FON, FDC)F = EXPAND (FON, FOFF)
# vectors <
threshold
Minimized cubes
Minimized cubes with false alarm
Extension with False Alarm insertion
13
False Alarm Insertion Example
EXPAND-FA IRREDUNDANT
REDUCE EXPAND
14
False Alarm Insertion Procedure
• Each call EXPAND-FA function expands multiple test cubes– How I sequentially go through the on-set?– Look at the paper– Which cube is selected to be expanded?– Same section – stopping criteria (when you
reach the target number of cubes)
15
False Alarm Insertion for One CubeA0 A1 A2 A3
0 0 0 xON1
x x 1 x
x 1 x x
1 x x 1
OFF1
OFF2
OFF3
Offset Matrix
False Alarm Matrix
0 0 2 -
0 2 0 -
1 0 0 -
B1
B2
B3
1 2 2 -
1 1 0 0
0 0 0 0
0 0 0 0
1 0 0 0
OFF1
OFF2
OFF3
ON1
ON2
ON’2
• False Alarm Matrix (i, j)– Entry (i, j) indicates false alarms between the off-set cube i
and (the expanded) cube when literal j is dropped
A0A1
A2A3
16
Simulation Configuration
• Single-failure scenarios in various nodes of32-bit Brent-Kung adder (prefix adder)
• Generate all the test vectors for two failing cases modeled by a stuck-at-0 and stuck-at-1 using ATALANTA* ATPG toolset
• Using SPEC2006 suite for workload-dependent case• Record the input arguments to the adder by running each
benchmark on an X86 simulator
• Analyzing area overhead in 2-issue and 4-issue microprocessors
*H.K. Lee and D.S. Ha. Atalanta: an efficient ATPG for combinational circuits. Technical Report;
Department of Electrical Engineering, Virginia Polytechnic Institute and State University, pages 93
12, 1993.
17
Comparison of Probability of Detection• Probability of detection: percentage of times that the checker unit
activates the recovery signal (could be false alarm or true positive)
Average PoD degrades with decrease in the number of test cubes Average PoD after inserting false alarms does not degrade
significantly in FA-128 or FA-64 or FA-32 compared to W/O FA This behavior is true for both workload-dependent and random cases
18
Comparison of False Alarm Insertion Algorithms• FA-Ag Algorithm
– At each iteration, all the cubes are expanded using the expand-FA procedure.
• Each entry indicates the fraction of false alarms from the total number of detection ( )– FA denotes the number fo false alarm minterms and TP the number of true
positive when a fault is truly happening.
On-average FA-Ag results in more overhead with increase in the number of test cubes compared to FA
19
Area Overhead
• Implemented approaches– Baseline k+1 (for k=2 , 4)
• K-issue processor with 1 redundant component
– K+TCAM• K-issue processor with checker implemented as TCAM
– K+FPGA• K-issue processor with checker implemented as FPGA
• 2+TCAM has better area than 2+1 for 32 and 48 cubes• 2+FPGA always has more area than baseline• Similar behavior for 4+TCAM and 4+FPGA
20
Conclusion
A new framework for online detection of failures at operand level of granularity
Design a flexible TCAM-based checker unit Propose a false alarm insertion algorithm to reduce the
number of vectors below the given threshold Incorporate the false alarm insertion algorithm into
ESPRESSO 2-level logic minimization tool Future works:
Use checker unit for other existing modules inside the processor Utilizing the online and operand-aware detection for other type of
faults such as delay path fault
21
Questions?