Course Outline FP7-4: Introduction to Reliability and Fault Tolerance...
Transcript of Course Outline FP7-4: Introduction to Reliability and Fault Tolerance...
Lecture 2Faults and Their Effects
• Fault-Error-Failure
• Characteristics of faults
• Classification of faults
• Failure modes and effects Reading: [WRD] Chapter 3
FP7-4: Introduction to Reliability and Fault
Tolerance
2Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)
Course Outline1. General introduction to reliability and fault
tolerance
2. Faults and their effects
3. Redundancy and fault tolerance techniques
4. Analysis and safety standards of dependable/safety-critical systems
5. Case study: Redundancy and fault tolerance techniques in Boeing 777 primary flight control/computer systems
3Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)
Today’s Goal
You should be able to understand
• What are faults, errors, and failures?
• Characteristics of faults
• Classification of faults
• Failure modes in computer systems
• Failure effects in computer systems
4Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)
Fault-Error-Failure(review)
• Fault: deviation of function from design valueHardwareSoftware
• Error: manifestation of fault by incorrect value
• Failure: deviation of system function from specification
• Three-world model:
5Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)
Cause-Effect Sequence of Faults• Passive or dormant fault
– A fault is dormant or passive if it is present in the product but the functioning inside the product is not disturbed
• Active fault– The fault is active if it has an effect on the product
functioning– The effect of a fault is manifested by an error in an
internal component or module. The transformation of a fault into an error is called fault activation
• Error propagation– The error is propagated inside the product till it reaches
the outputs of the product, hence creating a failure6Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)
Latency• Initial activation
– A fault remains passive until an error is produced in a module of the structure of the product. The first occurrence of an error provoked by the fault is called initial activation
• Latency– Latency is the mean time between the fault occurrence
and its initial activation as an error
(Geffroy and Motet, 2002)
7Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)
Attributes of a Fault
• Cause– That leads to the fault
• Nature– Relates to the intent of the cause of fault
• Duration– Length of time for which the fault persists
• Extent– How far does a fault propagate?
• Value– Consequence of the fault
8Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)
Origin of Faults• Phenomenological
– Physical • Adverse phenomena: threshold changes, open, short
– Human made• Results from human imperfection: operator mistake
• System Boundaries– Internal faults
• Parts of systems when invoked produce an error
– External faults • Interference or interaction with physical environment,
EMI, electrostatic perturbation, radiation, temperature, humidity, mechanical vibration, power surge, cosmic rays, a-particle hits, etc.
9Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)
Origin of Faults (cont’d)
• Phase of Creation– Design faults
• During development and modification– During establishment of the procedures– Implementation
• Faulty/incorrect component, construction, wiring, coding
• Phase during Operation– Operation faults– Occur during system use– Wear out of components– Over run of specification– Operator error
10Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)
Origin of Faults (cont’d)
SpecificationMistakes
ComponentsDefects
ExternalDisturbances
ImplementationMistakes
HardwareFaults
SoftwareFaults
Errors SystemFailures
FaultAvoidance
FaultMasking
FaultTolerance
11Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)
Origin of Faults (cont’d)
12Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)
Types of Faults(temporal/permanent persistence)
• Permanent faults:Total failure of a componentCaused by, for example, short-circuits or melt-downRemains until component is repaired or replaced
• Transient faults:Temporary malfunctions of a componentCaused by magnetic or ionizing radiation, or power fluctuation
• Intermittent faults:Repeated occurrences of transient faultsCaused by, for example, loose wires
13Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)
Types of Faults (temporal/permanent persistence) (cont’d)
PhysicalDefect
IncorrectDesign
Unstable orMarginalHardware
UnstableEnvironment
OperatorMistake
Error Failure
Permanent
Intermittent
Transient
14Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)
Characteristics of Faults
15Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)
Nature of Faults(functional vs. technological)
• Functional faults:or called conceptual faults, or human-made faultsAffect the way a product is specified, designed,
produced or used
An incorrect design implied by an omission of one piece of specification is an example of functional fault
Failures due to functional faults are named systematic failures
• Technological faults:Affect the means of implementationOften relate to hardware faults and physical faults
Failures due to technological faults are named disruptive failures or disruptions
16Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)
Fault Classification
(Avizienis et al, 2001)
17Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)
Fault Classification (cont’d)
(Avizienis et al, 2001)
18Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)
Failure Effects• Internal effects
– Fault-Error-Failure• External effects: consequences
– Benign: no serious consequences - minor failure– Significant: major failure– Serious: dangerous failure– Catastrophic: disastrous failure
(Geffroy and Motet, 2002)
19Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)
Failure Effects - An Example• External effects: consequences
– Benign: when the failure provokes only a partial reduction of the functions of the aircraft
– Significant: when a significant reduction of the functions of the aircraft is induced by the failure
– Serious: when the reduction of the functions of the aircraft do not allow a normal achievement of the flight
– Catastrophic: when the flight cannot be continued, or the landing is impossible
20Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)
Computer Systems and Associated Faults(review)
21Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)
Sensors Used in Computer Systems
22Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)
Sensors Used in Computer Systems
23Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)
Sensor Hardware Failure Modes
24Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)
Explanation of Sensor Failure Modes
25Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)
Sensor Faults and Time Responses
26Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)
Sensor Fault Effects
27Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)
Effectors Used in Computer Systems
28Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)
Effector Failure Modes
29Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)
Data Communication Link Failure Modes
• Two basic failure modes– Failure of receipt or transmission of data– Alteration of received or transmitted data
30Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)
Power and Interconnect Failure Modes
31Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)
Computer Hardware Failure Modes and Effects(1)
32Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)
Computer Hardware Failure Modes and Effects(2)
33Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)
Computer Hardware Failure Modes and Effects(3)
34Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)
Sensor-Computer Interface
35Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)
Sensor-Computer Interface Failure Modes and Effects
36Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)
Effector-Computer Interface
37Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)
Effector-Computer Interface Failure Modes and Effects
38Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)
Software Faults and Effects
• Classification of software faults based on origin– Application software faults
– System software faults– Development software faults
• Application software fault modes and effects– Misinterpreted requirements– Incorrect software design or implementation
– Clerical errors– Computer virus
39Lecture 2Lecture 2 Lecture Notes on Lecture Notes on Introduction to Reliability and Fault ToleranceIntroduction to Reliability and Fault Tolerance, by Youmin Zhang (AUE), by Youmin Zhang (AUE)
Reading and Exercise
• ReadingTextbook (WRD): Chapter 3
Further reading: Book (GM): Chapter 4
• ExerciseTry to recognize the failure modes and their effects in a practical system of your choice.