CIS 573 Computer Aided Verification Carl A. Gunter Fall 1999 Part 3.
-
Upload
dakota-colclough -
Category
Documents
-
view
231 -
download
5
Transcript of CIS 573 Computer Aided Verification Carl A. Gunter Fall 1999 Part 3.
CIS 573Computer Aided Verification
Carl A. GunterFall 1999
Part 3
London Ambulance Service
Between October 26 and November 4 of 1992 a Computer Aided Dispatch system for the London area failed.
Key system problems included» need for near perfect input information» poor interfaces between the ambulance crews and the
system» unacceptable reliability and performance of the software
Consequences are difficult to measure, but severe in some cases.
LAS Components
Human Factors in Safety
Case Studies» USS Vincennes» Three Mile Island
Leveson Chapters 5 and 6 Automotive Guidelines Leveson Chapter 17
USS Vincennes
On July 3, 1988 a US Navy Aegis cruiser shot down an airbus on a regularly-scheduled commercial flight.
Aegis is one of the Navy's most sophisticated weapon systems.
Aegis itself performed well. Human error was blamed: the captain received false reports from the tactical information coordinator.
Carlucci suggestion on user interface: put ``an arrow on [showing] whether it's ascending or descending.''
Three Mile Island
On the morning of 28 March 1979 a cascading sequence of failures caused extensive damage to the nuclear power plant at Three Mile Island near Harrisburg Pennsylvania.
Although radiation release was small, the repairs and clean-up cost 1 to 1.8 billion dollars.
Confidence in the safety of US nuclear facilities was significantly damaged as well.
Operator error was seen as a major contributing factor.
Generic Nuclear Power Plant
TMI Components
MaintenanceFailure
Opens
Scram
Fails Open1
Failed Open
Failed ClosedWater Pump
Blocks Backup
Boiled Dry
OperatorCuts BackWater Flow
High Pressure Injection Pumps
2
Failed Open
Failed ClosedBlocked
Boiled Dry
Saturation
Let Down Activated
Alarms
3
Failed Open
Saturation
Let Down Activated
Cooling Activated
Shut Off Pumps High Level of Neutrons
4
Failed Open
Saturation
Let Down Activated
Fuel Rods Rupture
Closed
Water Injected Hydrogen Explosion
5
Level 2 Conditions
No training of operators for saturation in the core. Inadequate operating procedures in place.
» Failure to follow rules for PORV.» Surveillance tests not adequately verified.
Control room ill-designed.» 100 alarms in 10 seconds» Key indicators poorly placed and key information not
displayed clearly (example: cooling water converting to steam had to be inferred from temp and pressure).
» Instruments off scale.» Printer not able to keep up.
Level 3 Root Causes
Design for controllability. Lack of attention to human factors. Quality assurance limited to safety-critical
components. Inadequate training. Limited licensing procedures.
MISRA Guidelines
Requirements are very domain-specific. Given a sufficiently narrow domain, it is possible
to provide more detailed assistance in requirements determination.
We look at a set of guidelines for establishing user requirements for automotive software and translating these into software requirements.
The guideline is that of the Motor Industry Software Reliability Association in the UK.
Scope ofGuidelines
SampleLife Cycle
Need for Integrity Levels
An automotive system must satisfy requirements that it not cause:» Harm to humans» Legislation to be broken» Undue traffic disruption» Damage to property or the environment (eg.
emissions)» Undue financial loss to either the manufacturer or
owner
Controllability Levels
Uncontrollable: Failures whose effects are not controllable by the vehicle occupants, and which are likely to lead to extremely severe outcomes. The outcome cannot be influenced by a human response.
Difficult to Control: This relates to failures whose effects are not normally controllable by the vehicle occupants but could, under favorable circumstances, be influenced by a mature human response.
Controllability Levels Continued
Debilitating: This relates to failures whose effects are usually controllable by a sensible human response and, whilst there is a reduction in safety margin, can usually be expected to lead to outcomes which are at worst severe.
Distracting: This relates to failures which produce operational limitations, but a normal human response will limit the outcome to no worse than minor.
Nuisance Only: This relates to failures where safety is not normally considered to be affected, and where customer satisfaction is the main consideration.
Initial Integrity Level
To determine an initial integrity level:» List all hazards that result from all the failures of the
system.» Assess each failure mode identified in the first step
to determine the controllability category.» The failure mode with the highest associated
controllability category determines the integrity level of the system.
Integrity Analysis
Integrity Levels
Controllability Category Acceptable Failure Rate Integrity Level
Uncontrollable Extremely improbably 4
Difficulty to Control Very remote 3
Debilitating Remote 2
Distracting Unlikely 1
Nuisance Only Reasonably Possible 0
Example
Here is an attempt at an analysis of a design defect in the 1983 Nissan Stanza I used to own. (It wasn't a computer error, but a computer error might display similar behavior.)
Hazard Powertrain drive: loss of power. Severity Factor Powertrain performance
affected. Controllability Category Debilitating Integrity Level 2
Human Error Probabilities
Extraordinary errors 10**-5: Errors for which it is difficult to conceive how they could occur. Stress free, with powerful cues pointing to success.
Regular errors 10**-4: Errors in regularly performed, commonplace simple tasks with minimum stress.
Errors of commission 10**-3: Errors such as pressing the wrong button or reading the wrong display. Reasonably complex tasks, little time available, some cues necessary.
Human Errors Continued
Errors of Omission 10**-2: Errors where dependence is placed on situation and memory. Complex, unfamiliar task with little feedback and some distraction.
Complex Task Errors 10**-1: Errors in performing highly complex tasks under considerable stress with little time available.
Creative Task Errors 1 to 10**-1: Errors in processes that involve creative thinking, or unfamiliar, complex operations where time is short and stress is high.
Recommendations
Level 0 is ISO 9001 Each of the remaining 4 levels carries a
recommendation for process activities on software with hazards at that level.
Areas Covered» Specification and design» Languages and compilers» Configuration management» Testing» Verification and validation» Access for assessment
Specification and Design
Structured method. Structured method supported by CASE tool. Formal specification for the functions at this
level. Formal specification of complete system.
Automated code generation (when available).
Testing
Show fitness for purpose. Test all safety requirements. Repeatable test plan.
Black box testing. White box module testing with defined coverage. Stress
testing against deadlock. Syntactic static analysis. 100% white box module testing. 100% requirements
testing. 100% integration testing. Semantic static analysis.
Verification and Validation
Show tests: are suitable; have been performed; are acceptable; exercise safety features. Traceable correction.
Structured program review. Show new new faults after corrections.
Automated static analysis. Proof (argument) of safety properties. Analysis for lack of deadlock. Justify test coverage. Show tests have been suitable.
All tools to be formally validated (when available). Proof (argument) of code against specification. Proof (argument) for lack of deadlock. Show object code reflects source code.
Access for Assessment
Requirements and acceptance criteria. QA and product plans. Training policy. System test results.
Design documents. Software test results. Training structure.
Techniques, processes, tools. Witness testing. Adequate training. Code.
Full access to all stages and processes.
Example ArchitectureStudy Deliverables
Software Requirements
Testing
Subcontracts
Aristocracy, Democracy, and System Design
Conceptual integrity is the most important consideration in system design.
The ratio of function to conceptual complexity is the ultimate test of system design.
To achieve conceptual integrity, a design must proceed from one mind or a small group of agreeing minds.
A conceptually integrated system is faster build and to test.
Brooks
Principles of Design
Norman offers the following two principles of good design:» Provide a good conceptual model.» Make things visible. Two important techniques are:
– Provide natural mappings– Provide feedback
Donald A. Norman, The Psychology of Everyday Things.
Examples of Bad Designs
Elegant doors that give no hint about whether or where to push or pull.
VCR's which provide inadequate feedback to indicate success of actions.
Telephones using too many unmemorable numerical instructions.
Examples of Good Designs
Original push-button telephones Certain kinds of single-handle faucets providing
a natural mapping to desired parameters Apple “desk-top” computer interface
Do Humans Cause Most Accidents?
From Leveson, Chapter 5:» 85% of work accidents are due to unsafe acts by
humans rather than unsafe conditions» 88% of all accidents are caused primarily by
dangerous acts of individual workers.» 60 to 80% of accidents are caused by loss of control
of energies in the system
Caveats
Data may be biased or incomplete. Positive actions are not usually recorded. Blame may be based on assuming that
operators can overcome all difficulties. Operators intervene at the limits. Hindsight is 20/20. It is hard to separate operator errors from design
errors.
Two Examples
A Model of Human Control
Mental Models
The Human as Monitor
The task may be impossible. The operator is dependant on the information
provided. The information is more indirect. Failures may be silent or masked. Little activity may result in lowered attention or
over reliance.
The Human as Backup
A poorly designed interface may leave operators with diminished proficiency and increased reluctance to intervene.
Fault-intolerant systems may lead to even larger errors.
The design of the system may make it harder to manage in a crisis.
The Human as Partner
The operator may simply be assigned the tasks that the designer cannot figure out how to automate.
The remaining tasks may be complex, and new tasks such as maintenance and monitoring may be added.