Preventing recurrence of industrial control system accident using assurance case
Mirko Napolano, Fumio Machida,
Roberto Pietrantuono, and Domenico Cotroneo
University of Naples Federico II, NEC Corporation
Outline
1. Motivation
2. Assurance of accident recurrence prevention
3. A case study
4. Conclusion
3
Critical infrastructure systems
▌Critical infrastructure systems
Power grids, gas pipelines, water supplies, communication and transportation services, etc.
They are essential for human lives and a wide variety of social activities
▌Advances and threats
Infrastructure systems are getting smarter
They may confront new types of threats
4
Accident can happen
▌Accident in critical infrastructure system Ex) PG&E Gas pipeline explosion killed 8 people and injured 58
September 9, 2010 - San Bruno, California
Avoiding similar accidents in the future, by lessons learned from the experience
NTSB accident report, PAR-11/01
5
Understanding what happened
▌Independent public agencies investigate on the accident
Authoritative body with experience in the field
Many months to reconstruct the events and assess the causes
Participations of all the stakeholders
▌At the end of this process a final report is published with:
Accident narrative
Systems descriptions and analyses
List of safety recommendations
▌Recommendations are guidelines to solve identified problems
E.g. “The flight management computer needs to be improved in accordance with the design specifications” (issued for an aircraft crash)
6
Challenge
▌A source of information is available: accident knowledge
Useful for third-party organizations that need to improve their existing systems in the same domain
▌Though, the list of recommendations is not enough:
Directed to the concerned system providers
Issued with generic solutions not straightforward to be applied
Goal
• Learning from experience clearly how to avoid effectively
reccurence of similar accidents
Our contribution
• A methodology to structure the accident knowledge through
graphical notations and arguments
Outline
1. Motivation
2. Assurance of accident recurrence prevention
3. A case study
4. Conclusion
8
Approach overview
▌Step 1: ECFMA (Event and Causal Factor Mitigation Analysis)
Graphical representation of events, problems and solutions
Information provided by the whole report (descriptions and recommendations)
▌Step 2: Assurance Case
Argumentation over the mitigation of the discovered problems
Instantiation of a new pattern, “Accident Recurrence Prevention Pattern”
9
Example of ECFMA
▌ECFA: tool used by investigative agencies as an accident causation model to identify root, direct and contributory causes
ECFMA introduces “solution” element connected to “causal factor”
10
Assurance case concepts
▌Safety case
A structured argument supported by a body of evidence used for assuring system safety
▌Assurance case
A general argumentation for assuring any kind of system property
▌Goal Structuring Notation (GSN)
A standard graphical notation widely used to describe assurance cases
▌Assurance case patterns
A means of documenting and reusing successful argument structures
11
Example of assurance case
12
Accident Recurrence Prevention Pattern
▌Define a new assurance case pattern
Goal is to ensure the recurrence of similar accidents in the future
Outline
1. Motivation
2. Assurance of accident recurrence prevention
3. A case study
4. Conclusion
14
Case study: PG&E accident
▌Date and location: September 9, 2010 - San Bruno, California
▌Industrial system: SCADA system managing and controlling a gas pipeline
▌The accident: an explosion in the pipeline caused by an overpressure not adequately managed by SCADA system
▌Consequences: 8 people killed, 58 injuries and 38 homes destroyed
NTSB accident report,PAR-11/01
15
Accident analysis
▌Analysis performed using the final report issued by NTSB
▌Problems identified from ECFMA
1. Lack of information in the maintenance work procedures (root cause)
2. Failure of the two redundant power supplies that energize the electrical valves in the station under maintenance (direct cause)
3. Inadequate fail-safe mode (contributory cause)
4. Absence of Remote Control Valves (RCV) (contributory cause)
▌Proposed solutions
1. Maintenance work procedure including requirements for identifying the likelihood and consequences of planned work on SCADA system
2. Use of separate circuit breakers in the station
3. Use of close fail-safe mode
4. Installation of RCVs along all the lines
16
PG&E ECFMA: an excerpt
17
PG&E assurance case
18
Evaluation
▌Comparison among two possible approaches to improve systems from accident knowledge: Use of list of recommendations
Assurance case
▌Consider the report as a structured document composed by links and nodes to be compared against the assurance case nodes: sections, subsections, paragraphs
▌Evaluation criteria: Understandability
Reusability
Effectiveness
19
Results
#1: Direct links from hazard to mitigation Recommendations 0/4
Assurance case 4/4
#2: Average hops from hazard to mitigation
Recommendations 24.5
Assurance case 1
Understandability
Reusability
#1: Links from recommendations to hazard context
Recommendations 0/4
Assurance case 4/4
#2: Hops from mitigation to hazard context Recommendations 31.25
Assurance case 2
Effectiveness
Number of mitigated hazards Recommendations 2
Assurance case 4
Assurance case provides more structured and reusable knowledge
20
Conclusions
▌We presented an approach to create a post-failure assurance case from the accident analysis
▌A new assurance case pattern has been developed to directly use the analysis outcomes about identified problems and solutions
▌Our approach effectively increases understandability and reusability in the system improving process
Top Related