Software Safety: An Oxymoron? - VanQ 2007

35
Software Safety: Software Safety: An Oxymoron? An Oxymoron? March 29, 2007 Ken Wong, Ph.D., Senior Systems Analyst McKesson Medical Imaging Group

description

It is a well-known maxim that complexity is an essential property of software. In spite of that, software-implemented functionality has increased dramatically in almost all safety-critical sectors. This increasing reliance on software provides great challenges to traditional practice of system safety, which focuses on the management of system hazards in order to mitigate safety risk. Software safety has emerged as sub-discipline of system safety to help address these challenges. However, the marriage of software and system safety has been an uneasy one. This talk discusses some of the issues that arise when software meets safety. Why safety is a distinct property from quality, reliability and other ilities will be addressed. The impact of software on system safety will be discussed. Finally, the need for safety verification of software-intensive systems will be briefly touched upon.

Transcript of Software Safety: An Oxymoron? - VanQ 2007

Page 1: Software Safety: An Oxymoron? - VanQ 2007

Software Safety:Software Safety:An Oxymoron?An Oxymoron?

March 29, 2007Ken Wong, Ph.D., Senior Systems Analyst

McKesson Medical Imaging Group

Page 2: Software Safety: An Oxymoron? - VanQ 2007

Points to Ponder*Points to Ponder*

A system can be correct and reliable and yet A system can be correct and reliable and yet unsafeunsafeSoftware safety is not about bugsSoftware safety is not about bugsProgram testing can be used to show the Program testing can be used to show the presence of bugs, but never to show their presence of bugs, but never to show their absenceabsence

* We will return to these statements in the * We will return to these statements in the discussiondiscussion

Page 3: Software Safety: An Oxymoron? - VanQ 2007

OutlineOutline

Introduction to Software SafetyIntroduction to Software SafetySoftware: Meet System SafetySoftware: Meet System SafetySystem Safety: Meet SoftwareSystem Safety: Meet SoftwareVerifying Software SafetyVerifying Software Safety

Page 4: Software Safety: An Oxymoron? - VanQ 2007

Introduction toIntroduction toSoftware SafetySoftware Safety

Page 5: Software Safety: An Oxymoron? - VanQ 2007

Software In the Real WorldSoftware In the Real World

TheracTherac 25 accidents25 accidentsArianeAriane 5 Flight 501 explosion5 Flight 501 explosionTitan 4 Centaur/Titan 4 Centaur/MilstarMilstar failurefailureTCAS collision near TCAS collision near UberlingenUberlingen, Germany, Germany

Page 6: Software Safety: An Oxymoron? - VanQ 2007

ArianeAriane 501501

Page 7: Software Safety: An Oxymoron? - VanQ 2007

ArianeAriane 501 Events501 Events

Destruction of Destruction of ArianeAriane 501 on 4 June 1996 501 on 4 June 1996 (from final report):(from final report):

nominal nominal behaviourbehaviour of the launcher up to H0 + 36 of the launcher up to H0 + 36 seconds; seconds; failure of the backfailure of the back--up up Inertial Reference SystemInertial Reference System(SRI) followed immediately by failure of the active (SRI) followed immediately by failure of the active SRI; SRI;

Page 8: Software Safety: An Oxymoron? - VanQ 2007

Building Dependable Software Building Dependable Software ……

Security

Safety

Reliability

Correctness

Quality

Page 9: Software Safety: An Oxymoron? - VanQ 2007

Safety is a Distinct PropertySafety is a Distinct Property

Safety is a distinct part of the interlocking puzzle Safety is a distinct part of the interlocking puzzle of how to build dependable softwareof how to build dependable software

A system can be A system can be ““correctcorrect”” and and ““reliablereliable”” and yet and yet unsafe!unsafe!Improved software process alone does not mean a Improved software process alone does not mean a safer systemsafer system

Note: These can be a contentious claims even Note: These can be a contentious claims even among safety engineers.among safety engineers.

Page 10: Software Safety: An Oxymoron? - VanQ 2007

Safety is Safety is ……

avoiding mishaps!avoiding mishaps!

Page 11: Software Safety: An Oxymoron? - VanQ 2007

Software: Software: Meet System SafetyMeet System Safety

Page 12: Software Safety: An Oxymoron? - VanQ 2007

““Is it SafeIs it Safe””? ?

Christian Szell: Is it safe? Babe: Yes, it's safe, it's very safe, it's so safe you wouldn't believe it.- Marathon Man 1976

Page 13: Software Safety: An Oxymoron? - VanQ 2007

System Safety System Safety

““System SafetySystem Safety”” is a systematic approach to is a systematic approach to safety primarily developed in the US for the safety primarily developed in the US for the aerospace and defense industriesaerospace and defense industries

Spreading to other industries, e.g., health careSpreading to other industries, e.g., health care

Focus on managing system Focus on managing system hazardshazardsE.g., FDA Quality System Regulation recommends E.g., FDA Quality System Regulation recommends ““risk analysisrisk analysis”” (A.K.A. hazard analysis)(A.K.A. hazard analysis)

Page 14: Software Safety: An Oxymoron? - VanQ 2007

System SafetySystem Safety

Hazard ID

Hazard Analysis

Risk Assessment

Hazard Mitigation

Safety Verification

Page 15: Software Safety: An Oxymoron? - VanQ 2007

HazardHazard

A A hazardhazard is the systemis the system’’s potential contribution s potential contribution to a mishapto a mishap

E.g., brake failure, engine overheatingE.g., brake failure, engine overheating

Key is understanding the system Key is understanding the system environmentenvironment

Page 16: Software Safety: An Oxymoron? - VanQ 2007

Hazards and MishapsHazards and Mishaps

mishaphazardhazard causes

System

Environment

Page 17: Software Safety: An Oxymoron? - VanQ 2007

ArianeAriane 501: SRI Bug?501: SRI Bug?

Uncaught exception from floating point Uncaught exception from floating point conversionconversion

From high value of BH (Horizontal Bias)From high value of BH (Horizontal Bias)Programming 101!Programming 101!

Conversion check deliberately removed for Conversion check deliberately removed for performance reasonsperformance reasons

SRI reused from SRI reused from ArianeAriane 44Check not required for Check not required for ArianeAriane 4 trajectory4 trajectory

Page 18: Software Safety: An Oxymoron? - VanQ 2007

Safety is a System PropertySafety is a System Property

SRI worked exactly as specified SRI worked exactly as specified –– for for ArianeAriane 4!4!ArianeAriane 5 trajectory different from 5 trajectory different from ArianeAriane 44SRI spec did NOT include SRI spec did NOT include ArianeAriane 5 trajectory data 5 trajectory data SRI NOT tested with SRI NOT tested with ArianeAriane 5 trajectory data5 trajectory data

““SafetySafety”” cannot be understood without knowing cannot be understood without knowing the operational environmentthe operational environment

FDA FDA ““useuse--relatedrelated”” vsvs ““device failuredevice failure”” hazardshazardsE.g., TCAS collision in GermanyE.g., TCAS collision in Germany

Page 19: Software Safety: An Oxymoron? - VanQ 2007

When Software Met SafetyWhen Software Met Safety

…… there was a definite risk in assuming that critical there was a definite risk in assuming that critical equipment such as the SRI had been validated by equipment such as the SRI had been validated by qualification on its own, or by previous use on qualification on its own, or by previous use on ArianeAriane 4. 4.

ARIANE 5 Flight 501 Failure ReportARIANE 5 Flight 501 Failure Report

Page 20: Software Safety: An Oxymoron? - VanQ 2007

System Safety: System Safety: Meet SoftwareMeet Software

Page 21: Software Safety: An Oxymoron? - VanQ 2007

In the beginning (or Europe) In the beginning (or Europe) ……**

Mechanical systems with well understood Mechanical systems with well understood designsdesignsHazards caused by component Hazards caused by component failure failure from from random hardware random hardware faultsfaults

Mitigation through Mitigation through integrity integrity andand redundancyredundancy

* Myth, but there is underlying truth in all good myths* Myth, but there is underlying truth in all good myths

Page 22: Software Safety: An Oxymoron? - VanQ 2007

Steering Fails

Steering Wheel Fails

Steering Assembly Fails Driver

Error

OR

OR

Basic Event

Intermediate Event

Fault Tree AnalysisFault Tree Analysis

Drive Shaft Fails Steering Control Software Fails

OR

Page 23: Software Safety: An Oxymoron? - VanQ 2007

Is Software Another Component?Is Software Another Component?

What is the probability that the steering What is the probability that the steering control software fails?control software fails?If software is just another component:If software is just another component:

1.1. Software cannot wear out or breakdown like a Software cannot wear out or breakdown like a mechanical componentmechanical component

2.2. Only Only ““faultfault”” is a programming bugis a programming bug3.3. Assuming programmers do their job, failure rate Assuming programmers do their job, failure rate

should be should be zerozero**

*Paraphrased from talk by a system safety engineer*Paraphrased from talk by a system safety engineer

Page 24: Software Safety: An Oxymoron? - VanQ 2007

Steering Fails

Steering Wheel

Steering Assembly Fails Driver

Error

OR

OR

Basic Event

Intermediate Event

Software RevealedSoftware Revealed

Drive Shaft Fails

OR

Steering Control Software Fails

Page 25: Software Safety: An Oxymoron? - VanQ 2007

The Software WerewolfThe Software Werewolf

Of all the monsters that fill the nightmares of our Of all the monsters that fill the nightmares of our folklore, none terrify more than werewolves, because folklore, none terrify more than werewolves, because they transform unexpectedly from the familiar into they transform unexpectedly from the familiar into horrors horrors …… The familiar software project, at least The familiar software project, at least as seen by the nontechnical manager, has something as seen by the nontechnical manager, has something of this character of this character ……

Frederick P. Brooks, Jr. from No Silver Bullet : Frederick P. Brooks, Jr. from No Silver Bullet : Essence and Accidents of Software EngineeringEssence and Accidents of Software Engineering

Page 26: Software Safety: An Oxymoron? - VanQ 2007

ArianeAriane 501: Safety in Numbers?501: Safety in Numbers?

In response to In response to ““faultfault””, the Primary SRI was , the Primary SRI was deliberately shutdowndeliberately shutdown

Attempt made to switch to backup SRIAttempt made to switch to backup SRITypical strategy in face of random failuresTypical strategy in face of random failures

However, BOTH However, BOTH SRIsSRIs shutdown!shutdown!““FaultFault”” due to same design in both due to same design in both SRIsSRIsException in nonException in non--essential component essential component

Page 27: Software Safety: An Oxymoron? - VanQ 2007

Safety is an Emergent Property Safety is an Emergent Property

Software safety is not about Software safety is not about ““faultsfaults””Many potential Many potential ““faultsfaults”” but not all created equal but not all created equal ––most have no impact on safetymost have no impact on safety

““CorrectCorrect”” behaviourbehaviour can contribute to the can contribute to the hazard!hazard!

Hazards can emerge from complex interactions Hazards can emerge from complex interactions between between ““correctcorrect”” componentscomponents

Page 28: Software Safety: An Oxymoron? - VanQ 2007

When Safety Met SoftwareWhen Safety Met Software

An underlying theme in the development of An underlying theme in the development of ArianeAriane 5 is 5 is the bias towards the mitigation of random failure.the bias towards the mitigation of random failure.Board wishes to point out that software is an expression Board wishes to point out that software is an expression of a highly detailed design and does not fail in the same of a highly detailed design and does not fail in the same sense as a mechanical system.sense as a mechanical system.

ARIANE 5 Flight 501 Failure Report ARIANE 5 Flight 501 Failure Report

Page 29: Software Safety: An Oxymoron? - VanQ 2007

Verifying Software Verifying Software SafetySafety

Page 30: Software Safety: An Oxymoron? - VanQ 2007

Software and Safety ProcessSoftware and Safety Process

Requirements

Design

Hazards

Source Code

Hazard ID, Analysis and Mitigation

Verification

Safety Verification

Page 31: Software Safety: An Oxymoron? - VanQ 2007

Limits of TestingLimits of Testing

Program testing can be used to show the presence of Program testing can be used to show the presence of bugs, but never to show their absence bugs, but never to show their absence

E. E. DijkstraDijkstra in Structured Programmingin Structured Programming

Page 32: Software Safety: An Oxymoron? - VanQ 2007

HazardHazard--Driven TestingDriven Testing

Focus on hazard Focus on hazard –– force it to occur force it to occur Consider:Consider:

Hazard risk (Hazard risk (““riskrisk--based testingbased testing””))Mishap scenariosMishap scenariosHazard causes identified during hazard analysisHazard causes identified during hazard analysisProblem reports/issues with safety implicationsProblem reports/issues with safety implications

See Jeffrey J. Joyce and Ken Wong, See Jeffrey J. Joyce and Ken Wong, HazardHazard--driven Testing of driven Testing of SafetySafety--Related SoftwareRelated Software

Page 33: Software Safety: An Oxymoron? - VanQ 2007

Summary and ConclusionsSummary and Conclusions

Safety is a distinct propertySafety is a distinct propertySafety is a system propertySafety is a system property

Operational and development environment factorsOperational and development environment factors

Safety is an emergent propertySafety is an emergent propertyHazards can emerge from complex interactions Hazards can emerge from complex interactions between between ““correctcorrect”” componentscomponents

Page 34: Software Safety: An Oxymoron? - VanQ 2007

Safety and Software: Safety and Software: Happy Together?Happy Together?

Page 35: Software Safety: An Oxymoron? - VanQ 2007

References*References*

ARIANE 5 Flight 501 Failure Report by the ARIANE 5 Flight 501 Failure Report by the Inquiry BoardInquiry Board, Paris, July 1996 , Paris, July 1996 Frederick P. Brooks, Jr., Frederick P. Brooks, Jr., No Silver Bullet : Essence No Silver Bullet : Essence and Accidents of Software Engineeringand Accidents of Software Engineering, Computer , Computer Magazine, April 1987Magazine, April 1987Jeffrey J. Joyce and Ken Wong, Jeffrey J. Joyce and Ken Wong, HazardHazard--driven driven Testing of SafetyTesting of Safety--Related SoftwareRelated Software, 21st , 21st International System Safety Conference, Ottawa, International System Safety Conference, Ottawa, Ontario, August 4Ontario, August 4--8, 20038, 2003

*All available on*All available on--lineline