Course Outline FP7-4: Introduction to Reliability and Fault Tolerance...

10
Lecture 2 Faults and Their Effects • Fault-Error-Failure • Characteristics of faults • Classification of faults • Failure modes and effects Reading: [WRD] Chapter 3 FP7-4: Introduction to Reliability and Fault Tolerance 2 Lecture 2 Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault Tolerance Introduction to Reliability and Fault Tolerance , by Youmin Zhang (AUE) , by Youmin Zhang (AUE) Course Outline 1. General introduction to reliability and fault tolerance 2. Faults and their effects 3. Redundancy and fault tolerance techniques 4. Analysis and safety standards of dependable/safety-critical systems 5. Case study: Redundancy and fault tolerance techniques in Boeing 777 primary flight control/computer systems 3 Lecture 2 Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault Tolerance Introduction to Reliability and Fault Tolerance , by Youmin Zhang (AUE) , by Youmin Zhang (AUE) Today’s Goal You should be able to understand What are faults, errors, and failures? Characteristics of faults Classification of faults Failure modes in computer systems Failure effects in computer systems 4 Lecture 2 Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault Tolerance Introduction to Reliability and Fault Tolerance , by Youmin Zhang (AUE) , by Youmin Zhang (AUE) Fault-Error-Failure (review) Fault: deviation of function from design value Hardware Software Error: manifestation of fault by incorrect value Failure: deviation of system function from specification Three-world model:

Transcript of Course Outline FP7-4: Introduction to Reliability and Fault Tolerance...

Page 1: Course Outline FP7-4: Introduction to Reliability and Fault Tolerance …homes.et.aau.dk/yang/course/IRFT/FT06_LN2.pdf · Fault Avoidance Fault Masking Fault Tolerance Lecture 2 Lecture

Lecture 2Faults and Their Effects

• Fault-Error-Failure

• Characteristics of faults

• Classification of faults

• Failure modes and effects Reading: [WRD] Chapter 3

FP7-4: Introduction to Reliability and Fault

Tolerance

2Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)

Course Outline1. General introduction to reliability and fault

tolerance

2. Faults and their effects

3. Redundancy and fault tolerance techniques

4. Analysis and safety standards of dependable/safety-critical systems

5. Case study: Redundancy and fault tolerance techniques in Boeing 777 primary flight control/computer systems

3Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)

Today’s Goal

You should be able to understand

• What are faults, errors, and failures?

• Characteristics of faults

• Classification of faults

• Failure modes in computer systems

• Failure effects in computer systems

4Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)

Fault-Error-Failure(review)

• Fault: deviation of function from design valueHardwareSoftware

• Error: manifestation of fault by incorrect value

• Failure: deviation of system function from specification

• Three-world model:

Page 2: Course Outline FP7-4: Introduction to Reliability and Fault Tolerance …homes.et.aau.dk/yang/course/IRFT/FT06_LN2.pdf · Fault Avoidance Fault Masking Fault Tolerance Lecture 2 Lecture

5Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)

Cause-Effect Sequence of Faults• Passive or dormant fault

– A fault is dormant or passive if it is present in the product but the functioning inside the product is not disturbed

• Active fault– The fault is active if it has an effect on the product

functioning– The effect of a fault is manifested by an error in an

internal component or module. The transformation of a fault into an error is called fault activation

• Error propagation– The error is propagated inside the product till it reaches

the outputs of the product, hence creating a failure6Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)

Latency• Initial activation

– A fault remains passive until an error is produced in a module of the structure of the product. The first occurrence of an error provoked by the fault is called initial activation

• Latency– Latency is the mean time between the fault occurrence

and its initial activation as an error

(Geffroy and Motet, 2002)

7Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)

Attributes of a Fault

• Cause– That leads to the fault

• Nature– Relates to the intent of the cause of fault

• Duration– Length of time for which the fault persists

• Extent– How far does a fault propagate?

• Value– Consequence of the fault

8Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)

Origin of Faults• Phenomenological

– Physical • Adverse phenomena: threshold changes, open, short

– Human made• Results from human imperfection: operator mistake

• System Boundaries– Internal faults

• Parts of systems when invoked produce an error

– External faults • Interference or interaction with physical environment,

EMI, electrostatic perturbation, radiation, temperature, humidity, mechanical vibration, power surge, cosmic rays, a-particle hits, etc.

Page 3: Course Outline FP7-4: Introduction to Reliability and Fault Tolerance …homes.et.aau.dk/yang/course/IRFT/FT06_LN2.pdf · Fault Avoidance Fault Masking Fault Tolerance Lecture 2 Lecture

9Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)

Origin of Faults (cont’d)

• Phase of Creation– Design faults

• During development and modification– During establishment of the procedures– Implementation

• Faulty/incorrect component, construction, wiring, coding

• Phase during Operation– Operation faults– Occur during system use– Wear out of components– Over run of specification– Operator error

10Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)

Origin of Faults (cont’d)

SpecificationMistakes

ComponentsDefects

ExternalDisturbances

ImplementationMistakes

HardwareFaults

SoftwareFaults

Errors SystemFailures

FaultAvoidance

FaultMasking

FaultTolerance

11Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)

Origin of Faults (cont’d)

12Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)

Types of Faults(temporal/permanent persistence)

• Permanent faults:Total failure of a componentCaused by, for example, short-circuits or melt-downRemains until component is repaired or replaced

• Transient faults:Temporary malfunctions of a componentCaused by magnetic or ionizing radiation, or power fluctuation

• Intermittent faults:Repeated occurrences of transient faultsCaused by, for example, loose wires

Page 4: Course Outline FP7-4: Introduction to Reliability and Fault Tolerance …homes.et.aau.dk/yang/course/IRFT/FT06_LN2.pdf · Fault Avoidance Fault Masking Fault Tolerance Lecture 2 Lecture

13Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)

Types of Faults (temporal/permanent persistence) (cont’d)

PhysicalDefect

IncorrectDesign

Unstable orMarginalHardware

UnstableEnvironment

OperatorMistake

Error Failure

Permanent

Intermittent

Transient

14Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)

Characteristics of Faults

15Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)

Nature of Faults(functional vs. technological)

• Functional faults:or called conceptual faults, or human-made faultsAffect the way a product is specified, designed,

produced or used

An incorrect design implied by an omission of one piece of specification is an example of functional fault

Failures due to functional faults are named systematic failures

• Technological faults:Affect the means of implementationOften relate to hardware faults and physical faults

Failures due to technological faults are named disruptive failures or disruptions

16Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)

Fault Classification

(Avizienis et al, 2001)

Page 5: Course Outline FP7-4: Introduction to Reliability and Fault Tolerance …homes.et.aau.dk/yang/course/IRFT/FT06_LN2.pdf · Fault Avoidance Fault Masking Fault Tolerance Lecture 2 Lecture

17Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)

Fault Classification (cont’d)

(Avizienis et al, 2001)

18Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)

Failure Effects• Internal effects

– Fault-Error-Failure• External effects: consequences

– Benign: no serious consequences - minor failure– Significant: major failure– Serious: dangerous failure– Catastrophic: disastrous failure

(Geffroy and Motet, 2002)

19Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)

Failure Effects - An Example• External effects: consequences

– Benign: when the failure provokes only a partial reduction of the functions of the aircraft

– Significant: when a significant reduction of the functions of the aircraft is induced by the failure

– Serious: when the reduction of the functions of the aircraft do not allow a normal achievement of the flight

– Catastrophic: when the flight cannot be continued, or the landing is impossible

20Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)

Computer Systems and Associated Faults(review)

Page 6: Course Outline FP7-4: Introduction to Reliability and Fault Tolerance …homes.et.aau.dk/yang/course/IRFT/FT06_LN2.pdf · Fault Avoidance Fault Masking Fault Tolerance Lecture 2 Lecture

21Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)

Sensors Used in Computer Systems

22Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)

Sensors Used in Computer Systems

23Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)

Sensor Hardware Failure Modes

24Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)

Explanation of Sensor Failure Modes

Page 7: Course Outline FP7-4: Introduction to Reliability and Fault Tolerance …homes.et.aau.dk/yang/course/IRFT/FT06_LN2.pdf · Fault Avoidance Fault Masking Fault Tolerance Lecture 2 Lecture

25Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)

Sensor Faults and Time Responses

26Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)

Sensor Fault Effects

27Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)

Effectors Used in Computer Systems

28Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)

Effector Failure Modes

Page 8: Course Outline FP7-4: Introduction to Reliability and Fault Tolerance …homes.et.aau.dk/yang/course/IRFT/FT06_LN2.pdf · Fault Avoidance Fault Masking Fault Tolerance Lecture 2 Lecture

29Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)

Data Communication Link Failure Modes

• Two basic failure modes– Failure of receipt or transmission of data– Alteration of received or transmitted data

30Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)

Power and Interconnect Failure Modes

31Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)

Computer Hardware Failure Modes and Effects(1)

32Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)

Computer Hardware Failure Modes and Effects(2)

Page 9: Course Outline FP7-4: Introduction to Reliability and Fault Tolerance …homes.et.aau.dk/yang/course/IRFT/FT06_LN2.pdf · Fault Avoidance Fault Masking Fault Tolerance Lecture 2 Lecture

33Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)

Computer Hardware Failure Modes and Effects(3)

34Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)

Sensor-Computer Interface

35Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)

Sensor-Computer Interface Failure Modes and Effects

36Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)

Effector-Computer Interface

Page 10: Course Outline FP7-4: Introduction to Reliability and Fault Tolerance …homes.et.aau.dk/yang/course/IRFT/FT06_LN2.pdf · Fault Avoidance Fault Masking Fault Tolerance Lecture 2 Lecture

37Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)

Effector-Computer Interface Failure Modes and Effects

38Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)

Software Faults and Effects

• Classification of software faults based on origin– Application software faults

– System software faults– Development software faults

• Application software fault modes and effects– Misinterpreted requirements– Incorrect software design or implementation

– Clerical errors– Computer virus

39Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)

Reading and Exercise

• ReadingTextbook (WRD): Chapter 3

Further reading: Book (GM): Chapter 4

• ExerciseTry to recognize the failure modes and their effects in a practical system of your choice.