SBCCI08

21
SBCCI’08 - 1-4 September - Gramado, Brazil :: These slides are available at http://www.slideshare.net/josemmf (1) A comparative analysis of fault injection methods via enhanced on-chip debug infrastructures A comparative analysis of fault injection methods via enhanced on-chip debug infrastructures J. M. Martins Ferreira [ [email protected] ] FEUP / DEEC Rua Dr. Roberto Frias 4200-465 Porto - PORTUGAL André Fidalgo, Gustavo R. Alves Manuel Gericota [ anf/gca/mgg @isep.ipp.pt ] ISEP / DEE Rua Ant. Bernardino Almeida, 431 4200-072 Porto - PORTUGAL SBCCI’08: Gramado, Brazil, 1-4 September 2008 These slides are available at http://www.slideshare.net/josemmf

description

 

Transcript of SBCCI08

Page 1: SBCCI08

SBCCI’08 - 1-4 September - Gramado, Brazil :: These slides are available at http://www.slideshare.net/josemmf (1)

A comparative analysis of fault injection methods via enhanced on-chip debug infrastructures

A comparative analysis of fault injection methods via enhanced on-chip debug infrastructuresJ. M. Martins Ferreira [ [email protected] ]FEUP / DEECRua Dr. Roberto Frias4200-465 Porto - PORTUGAL

André Fidalgo, Gustavo R. Alves Manuel Gericota [ anf/gca/mgg @isep.ipp.pt ]ISEP / DEE Rua Ant. Bernardino Almeida, 4314200-072 Porto - PORTUGAL

SBCCI’08: Gramado, Brazil, 1-4 September 2008These slides are available at http://www.slideshare.net/josemmf

Page 2: SBCCI08

SBCCI’08 - 1-4 September - Gramado, Brazil :: These slides are available at http://www.slideshare.net/josemmf (2)

A comparative analysis of fault injection methods via enhanced on-chip debug infrastructures

Outline of the presentation

• Introduction and motivation

• Setup, workbench, workflow

• Experimental results– Basic, extended and OCD-FI – OCD-FI extensions (EDAC, RTREG)

• Comparison and discussion

• Conclusion

Page 3: SBCCI08

SBCCI’08 - 1-4 September - Gramado, Brazil :: These slides are available at http://www.slideshare.net/josemmf (3)

A comparative analysis of fault injection methods via enhanced on-chip debug infrastructures

Scope, focus, setup

• Scope: usage of OCD resources for validating fault tolerance / fault injection

• Focus: comparative analysis of experimental results for various OCD configurations and debugging scenarios

• Setup: a) 32-bit Freescale MPC-565, iSystem IC3000 (iTracePro), Winidea 2005 b) OCD enhancements in VHDL

Page 4: SBCCI08

SBCCI’08 - 1-4 September - Gramado, Brazil :: These slides are available at http://www.slideshare.net/josemmf (4)

A comparative analysis of fault injection methods via enhanced on-chip debug infrastructures

Motivation

• OCD offers controllability and observability features that may be used to inject faults and observe their effect (R/W access to registers and memory)

• Usefulness for fault tolerance validation may be limited in bandwidth, coverage and repeatability / representativeness of results

• Mitigation is possible by enhancing OCD

Page 5: SBCCI08

SBCCI’08 - 1-4 September - Gramado, Brazil :: These slides are available at http://www.slideshare.net/josemmf (5)

A comparative analysis of fault injection methods via enhanced on-chip debug infrastructures

Our approach

• Configurations: basic (2:8), extended (8:8), OCD-FI (with a fault injection module)

• Fault injection scenarios: off-line or real-time, predefined or on-the-fly

• OCD-FI is able to cope with error detection / correction and real-time requirements

• Comparison of results uses a common set of workload applications and FI campaigns

Page 6: SBCCI08

SBCCI’08 - 1-4 September - Gramado, Brazil :: These slides are available at http://www.slideshare.net/josemmf (6)

A comparative analysis of fault injection methods via enhanced on-chip debug infrastructures

NEXUS FI for the MPC565

CPUHost Machine(Pentium PC)

Debugger(Fault Injector)

Trace Data

Data Link

Campaign Data

IC3000

OCD

MPC565

NEXUS

Trace data: Program trace data output by the OCD

Campaign data: scripts that describe the FI experiments

NEXUS Debug Features

Class Usability for FI

Run-Control 1 External Triggering

Breakpoints 1 Internal Triggering

Watchpoints 1 Real Time Triggering

Static Register and Memory Access

1 Static Fault Insertion

Program Trace 2 Fault Effects Classification

Dynamic Register and Memory Access

3 Real Time Fault Insertion

Data Trace 3Improved Fault Effects Classification

Page 7: SBCCI08

SBCCI’08 - 1-4 September - Gramado, Brazil :: These slides are available at http://www.slideshare.net/josemmf (7)

A comparative analysis of fault injection methods via enhanced on-chip debug infrastructures

OCD infrastructure developed to support this work• NEXUS class 2

compliant with real--time memory access

• Adjustable data bus

• OCD configurations– Basic (2,8)– Extended (8,8)– OCD-FI: comprises a fault injection module

BUSES

OCD

RCT

RWAMQMAUX

PORT

Bus Snooper

CPUcore

ROM

RAM

I/O

Bus Master

FI

Page 8: SBCCI08

SBCCI’08 - 1-4 September - Gramado, Brazil :: These slides are available at http://www.slideshare.net/josemmf (8)

A comparative analysis of fault injection methods via enhanced on-chip debug infrastructures

Fault injection: Workload applications• Workload applications:

– Matrix adder (Madder)– Vector sorter (Vsorter)– LUT control algorithm (Xcontrol)

• Each application was implemented in two versions: normal and fault tolerant

• Fault tolerance by duplicating data in memory and repeating each operation

Page 9: SBCCI08

SBCCI’08 - 1-4 September - Gramado, Brazil :: These slides are available at http://www.slideshare.net/josemmf (9)

A comparative analysis of fault injection methods via enhanced on-chip debug infrastructures

Fault injection campaigns

• Scripts that define 10 FI experiments during system operation

• 100 campaigns were executed for each scenario using the three workload applications (Madder, Vsorter, Xcontrol)

• FI campaigns mostly target memory positions and cause a bit-flip to emulate SEU effects

Page 10: SBCCI08

SBCCI’08 - 1-4 September - Gramado, Brazil :: These slides are available at http://www.slideshare.net/josemmf (10)

A comparative analysis of fault injection methods via enhanced on-chip debug infrastructures

Predetermination to improve performance of FI campaigns• Predetermination of the contents of the

target memory cell at the FI instant may be done through a “gold run” or by ensuring:– Complete knowledge of the program flow– Full observability of external inputs– Precise control of the FI instant and location

• Otherwise the target memory cell must be read “immediately” before the FI instant

Page 11: SBCCI08

SBCCI’08 - 1-4 September - Gramado, Brazil :: These slides are available at http://www.slideshare.net/josemmf (11)

A comparative analysis of fault injection methods via enhanced on-chip debug infrastructures

Experimental scenarios

Configur. &

ScenarioBandwidth

Predetermination of the faulty value

Fault injection method

Delays (Clk cycles)

Set-Up Insertion

BOF MDI=2 MDO=8 YES Offline 22 35

BOF+ MDI=2 MDO=8 NO Offline 22 44

EOF MDI=8 MDO=8 YES Offline 6 9

EOF+ MDI=8 MDO=8 NO Offline 6 18

BRT MDI=2 MDO=8 YES Real Time 22 35

BRT+ MDI=2 MDO=8 NO Real Time 22 44

ERT MDI=8 MDO=8 YES Real Time 6 9

ERT+ MDI=8 MDO=8 NO Real Time 6 18

OCD-FI MDI=2 MDO=8 YES Real Time 57 2

OCD-FI+ MDI=2 MDO=8 NO Real Time 57 4

B: Basic; E: Extended; OCD-FI : OCD for Fault InjectionOF: Off-line; RT: Real-time; +: predetermination not required

Page 12: SBCCI08

SBCCI’08 - 1-4 September - Gramado, Brazil :: These slides are available at http://www.slideshare.net/josemmf (12)

A comparative analysis of fault injection methods via enhanced on-chip debug infrastructures

Experimental results (%): B, E, OCD-FI (results)

UERR: Undetected errors (incorrect final result that goes undetected)DERR: Detected errors (error detection signal activated)NERR: No errors (application ended correctly)

  MAdder VSorter XControl

Configur. &

Scenario

non-FT SW-FT non-FT SW-FT non-FT SW-FT

UERR NERR DERR UERR NERR UERR NERR DERR UERR NERR UERR NERR DERR UERR NERR

OFF 19 81 28 13,9 58,1 98 2 97 2 1

Not PossibleBRT 19,4 80,6 28,3 13,8 57,9 98,1 1,9 96,8 2 1,2

ERT 19,2 80,8 28,1 13,9 58 98 2 96,9 2 1,1

OCD-FI 19 81 28 13,9 58,1 98 2 97 2 1

BRT+ 19,5 80,5 28,4 13,8 57,8 98,2 1,8 96,7 1,9 1,4 29,3 70,7 29,1 1,5 69,4

ERT+ 19,3 80,7 28,2 13,8 58 98,1 1,9 96,8 1,9 1,3 29,6 70,4 28,9 1,2 69,9

OCD-FI+ 19,1 80,9 28,1 13,9 58 98 2 96,9 1,9 1,2 29,8 70,2 28,8 1,1 70,1

Page 13: SBCCI08

SBCCI’08 - 1-4 September - Gramado, Brazil :: These slides are available at http://www.slideshare.net/josemmf (13)

A comparative analysis of fault injection methods via enhanced on-chip debug infrastructures

Experimental results (%):Erroneous fault insertions• Further experiments in RT scenarios were

carried out to identify erroneous FI which were classified as Inconclusive (INC)

Configur. &

Scenario

non-FT SW-FT

MAdder VSorter XControl MAdder VSorter XControl

OFF 0 0

BRT 3,1 0,9Not

Possible

4 2,2Not

PossibleERT 1,4 0,6 2,3 1,1

OCD-FI 0,2 0,1 0,2 0,2

BRT+ 3 1,2 2,1 4,8 2,8 3,2

ERT+ 2 0,8 1,5 3,7 2,1 2,4

OCD-FI+ 0,4 0,2 0,3 1,7 1,2 1,3

Page 14: SBCCI08

SBCCI’08 - 1-4 September - Gramado, Brazil :: These slides are available at http://www.slideshare.net/josemmf (14)

A comparative analysis of fault injection methods via enhanced on-chip debug infrastructures

Experimental results: Pros and cons of FI methods• Off-line configurations always produce the

most reliable results

• The CPU may overwrite the target memory cell before the FI is complete (INC)

• INC results increase with the delay between fault triggering and fault insertion, and are mitigated by OCD-FI and predetermination

Page 15: SBCCI08

SBCCI’08 - 1-4 September - Gramado, Brazil :: These slides are available at http://www.slideshare.net/josemmf (15)

A comparative analysis of fault injection methods via enhanced on-chip debug infrastructures

Experimental results (%): OCD-FI extensions for EDAC• FT versions of the workload applications

were not used due to EDAC

DERR: Percentage of errors detected that were corrected by EDAC

No Predetermination Predetermination

Derr Uerr Nerr INC Derr Uerr Nerr INC

MAdder 39,6 0 58,8 1,6 39,7 0 59,5 0,8

VSorter 98,3 0 0,8 0,9 99 0 0,7 0,3

XControl 29,9 0 69,1 1 30 0 69,5 0,5

Page 16: SBCCI08

SBCCI’08 - 1-4 September - Gramado, Brazil :: These slides are available at http://www.slideshare.net/josemmf (16)

A comparative analysis of fault injection methods via enhanced on-chip debug infrastructures

Experimental results: Pros and cons of OCD-FI EDAC extensions• EDAC mechanisms effectively eliminate

the effects of single bit-flip errors on the target system

• The OCD-FI EDAC extension enables FI into protected memory blocks

Page 17: SBCCI08

SBCCI’08 - 1-4 September - Gramado, Brazil :: These slides are available at http://www.slideshare.net/josemmf (17)

A comparative analysis of fault injection methods via enhanced on-chip debug infrastructures

Experimental results (%): OCD-FI for RTREG• RT register access requires a collision

manager that degrades dynamic performance…

non-FT SW-FT

  Uerr Nerr Derr Uerr Nerr

MAdder 89 11 62 22 16

VSorter 60 40 46 14 40

Page 18: SBCCI08

SBCCI’08 - 1-4 September - Gramado, Brazil :: These slides are available at http://www.slideshare.net/josemmf (18)

A comparative analysis of fault injection methods via enhanced on-chip debug infrastructures

Experimental results: Pros and cons of OCD-FI RTREG extensions• Due to their higher occurrence rate, INC

results were explicitly avoided

• Not all code lines qualify to trigger a FI experiment (45% of the code lines could be used for triggering accumulator FI)

• FI results and software fault tolerance efficiency differ significantly between registers and memory

Page 19: SBCCI08

SBCCI’08 - 1-4 September - Gramado, Brazil :: These slides are available at http://www.slideshare.net/josemmf (19)

A comparative analysis of fault injection methods via enhanced on-chip debug infrastructures

Performance (FI rate)

• Maximum faults / second rates (single bit-flips on the same memory cell, 30 MHz clock frequency):

Conf. & Scenario Real Time Halted Access

BOF+

Not possible

400k

EOF+ 1150k

BRT+ 454k 400k

ERT+ 1250k 1150k

OCD_FI+ 491k 483k

Page 20: SBCCI08

SBCCI’08 - 1-4 September - Gramado, Brazil :: These slides are available at http://www.slideshare.net/josemmf (20)

A comparative analysis of fault injection methods via enhanced on-chip debug infrastructures

Performance (overhead, dynamic)

• Silicon overhead and maximum operating frequency on a Virtex-2 FPGA:

CPU Core

OCD OCD-FI EDAC RTREG

Area Overhead Max f

[Eq Gates] [%] [MHz]

x         53926 75,4% 37

x     x   55018 76,9% 32

x BRT       71527 100,0% 36

x BRT   x   72619 101,5% 32

x ERT       76127 106,4% 36

x   x     71842 100,4% 36

x   +EDAC x   73184 102,3% 32

x   +RTREG   x 76392 106,8% 27

x   +BOTH x x 77484 108,3% 25

Page 21: SBCCI08

SBCCI’08 - 1-4 September - Gramado, Brazil :: These slides are available at http://www.slideshare.net/josemmf (21)

A comparative analysis of fault injection methods via enhanced on-chip debug infrastructures

Conclusions

• Wide spectrum (FPGA, ASIC, etc.)

• FI rate does not justify real-time

• Low overhead

• Better C&O than radiation techniques

• Less intrusive than software techniques

• Should be used with the final HW and SW

• Limitations in coverage, lack of standards