Safety in Integrated Systems
-
Upload
edward-smith -
Category
Documents
-
view
214 -
download
0
Transcript of Safety in Integrated Systems
-
8/11/2019 Safety in Integrated Systems
1/21
Safety in Integrated Systems Health Engineering and Management1
Nancy G. Leveson
Aeronautics and Astronautics
Engineering SystemsMIT
Abstract: This paper describes the state o the art in system saety engineering and management a!ong
"ith ne" mode!s o accident causation# based on systems theory# that may a!!o" us to great!y e$pand the
po"er o the techni%ues and too!s "e use. The ne" mode!s consider hard"are# sot"are# humans#
management decision&ma'ing# and organi(ationa! design as an integrated "ho!e. Ne" ha(ard ana!ysis
techni%ues based on these e$panded mode!s o causation provide a means or obtaining the inormation
necessary to design saety into the system and to determine "hich are the most critica! parameters to
monitor during operations and ho" to respond to them. The paper irst describes and contrasts the current
system saety and re!iabi!ity engineering approaches to saety and the traditiona! methods used in both
these ie!ds. It then out!ines the ne" systems&theoretic approach being deve!oped in Europe and the ).S.
and the app!ication o the ne" approach to aerospace systems# inc!uding a recent ris' ana!ysis and hea!th
assessment o the NASA manned space program management structure and saety cu!ture that used the
ne" approach.
Reliability Engineering vs. System Safety
A!though saety has been a concern o engineering or centuries and reerences to designing or saety
appear in the 1*thcentury +,ooper# 1-*1# modern approaches to engineering saety appeared as a resu!t
o changes that occurred in the mid&1*//s.
In the years o!!o"ing 0or!d 0ar II# the gro"th in mi!itary e!ectronics gave rise to re!iabi!ity
engineering. e!iabi!ity "as a!so important to NASA and our space eorts# as evidenced by the high
ai!ure rate o space missions in the !ate 1*2/s and ear!y 1*3/s and "as adopted as a basic approach to
achieving mission success. e!iabi!ity engineering is concerned primari!y "ith ai!ures and ai!ure rate
reduction. The re!iabi!ity engineering approach to saety thus concentrates on ai!ure as the cause o
accidents. A variety o techni%ues are used to minimi(e component ai!ures and thereby the ai!ures ocomp!e$ systems caused by component ai!ure. These re!iabi!ity engineering techni%ues inc!ude para!!e!
redundancy# standby sparing# bui!t&in saety actors and margins# derating# screening# and timed
rep!acements.
System Saety arose around the same time# primari!y in the deense industry. A!though many o the
basic concepts o system saety# such as anticipating ha(ards and accidents and bui!ding in saety# predate
the post&0or!d 0ar II period# much o the ear!y deve!opment o system saety as a separate discip!ine
began "ith !ight engineers immediate!y ater the 0ar and "as deve!oped into a mature discip!ine in the
ear!y ba!!istic missi!e programs o the 1*2/s and 1*3/s. The space program "as the second ma4or
app!ication area to app!y system saety approaches in a discip!ined manner. Ater the Apo!!o 5/6 ire in
1*37# NASA hired 8erome Lederer# the head o the 9!ight Saety 9oundation# to head manned space&!ight
saety and# !ater# a!! saety activities. Through his !eadership# an e$tensive program o system saety "as
estab!ished or space pro4ects# much o it patterned ater the Air 9orce and o programs. 0hi!e traditiona! re!iabi!ity engineering techni%ues are oten eective in increasing re!iabi!ity# they do
not necessari!y increase saety. In act# their use under some conditions may actua!!y reduce saety. 9or
e$amp!e# increasing the burst&pressure to "or'ing&pressure ratio o a tan' oten introduces ne" dangers
o an e$p!osion or chemica! reaction in the event o a rupture.
1This paper is a drat paper or the NASA Ames Integrated System ;ea!th Engineering and Management 9orum#
November 5//2. The research reported in the paper "as part!y supported by the ,enter or
-
8/11/2019 Safety in Integrated Systems
2/21
System saety# in contrast to the re!iabi!ity engineering ocus on preventing ai!ure# is primari!y
concerned "ith the management o ha(ards: their identiication# eva!uation# e!imination# and contro!
through ana!ysis# design and management procedures. In the case o the tan' rupture# system saety "ou!d
!oo' at the interactions among the system components rather than 4ust at ai!ures or engineering strengths.
e!iabi!ity engineers oten assume that re!iabi!ity and saety are synonymous# but this assumption istrue on!y in specia! cases. In genera!# saety has a broader scope than ai!ures# and ai!ures may not
compromise saety. There is obvious!y an over!ap bet"een re!iabi!ity and saety# but many accidentsoccur "ithout any component ai!ure# e.g.# the Mars
-
8/11/2019 Safety in Integrated Systems
3/21
Techni%ues such as au!t tree ana!ysis or ai!ure modes and eects ana!ysis attempt to de!ineate a!! the
re!evant chains o events that can !ead to the !oss event being investigated.
The event chain mode!s that resu!t rom this ana!ysis orm the basis or most re!iabi!ity and saety
engineering design techni%ues# such as redundancy# overdesign# saety margins# inter!oc's# etc. The
designer attempts to interrupt the chain o events by preventing the occurrence o events in the chain.9igure 1 is annotated "ith some potentia! design techni%ues that might be used to interrupt the !o" o
events to the !oss state. Not sho"n# but very common# is to introduce @andC re!ationships in the chain# i.e.#to design such that t"o or more ai!ures must happen or the chain to continue to"ard the !oss state# thus
reducing the probabi!ity o the !oss occurring.
Figure ! "hain#of#Events Accident Model E$ample
e!iabi!ity engineering re!ies on redundancy# increasing component integrity +e.g.# incorporating saety
margins or physica! components and attempting to achieve error&ree behavior o the !ogica! and human
components and using @saetyC unctions during operations to recover rom ai!ures +e.g.# shutdo"n andother types o protection systems. System saety uses many o the same techni%ues# but ocuses them on
e!iminating or preventing ha(ards. System saety engineers a!so tend to use a "ider variety o design
approaches# inc!uding various types o inter!oc's to prevent the system rom entering a ha(ardous state or
!eaving a sae state.
In summary# re!iabi!ity engineering ocuses on ai!ures "hi!e system saety ocuses on ha(ards. These
are not e%uiva!ent. ,.D.Mi!!er# o the ounders o system saety in the 1*2/s# cautioned that
@distinguishing ha(ards rom ai!ures is imp!icit in understanding the dierence bet"een saety and
re!iabi!ityC +Mi!!er# 1*-2. 0hi!e both o these approaches "or' "e!! "ith respect to their dierent goa!s or the re!ative!y simp!e
systems or "hich they "ere designed in the 1*2/ and 1*3/s# they are not as eective or the comp!e$
systems and ne" techno!ogy common today. The basic ha(ard ana!ysis techni%ues have not changed
signiicant!y since 9au!t Tree Ana!ysis "as introduced in 1*35. The introduction o digita! systems and
sot"are contro!# in particu!ar# has had a proound eect in terms o techno!ogy that does not satisy the
under!ying assumptions o the traditiona! ha(ard and re!iabi!ity ana!ysis and saety engineeringtechni%ues. It a!so a!!o"s !eve!s o comp!e$ity in our system designs that over"he!ms the traditiona!
approaches +Leveson 1**2# Leveson 5//2. A re!ated ne" deve!opment is the introduction o distributed
-
8/11/2019 Safety in Integrated Systems
4/21
human&automation contro! and the changing ro!e o human operators rom contro!!er to high&!eve!
supervisor and decision&ma'er. The simp!e s!ips and operationa! mista'es o the past are being e!iminated
by substituting automation# resu!ting in a change in the ro!e humans p!ay in accidents and the substitution
o cognitive!y comp!e$ decision&ma'ing errors or the simp!e s!ips o the past. In the most technica!!y
advanced aircrat# the types o errors pi!ots ma'e have changed but not been e!iminated +Bi!!ings# 1**3. Another important !imitation o the chain&o&events mode! is that it ignores the socia! and
organi(ationa! actors in accidents. Both the ,ha!!enger and ,o!umbia accident reports stressed theimportance o these actors in accident causation.
-
8/11/2019 Safety in Integrated Systems
5/21
-
8/11/2019 Safety in Integrated Systems
6/21
Figure '. Rasmussen Socio#Technical Model for Ris( Management
STAM)
In STAMit misinterpreted
noise rom a sensor as an indication the spacecrat had reached the surace o the p!anet.
Accidents such as these# invo!ving engineering design errors and misunderstanding o the unctiona!
re%uirements +in the case o the Mars
-
8/11/2019 Safety in Integrated Systems
7/21
deve!opment process# i.e.# ris' is not ade%uate!y managed in the design# imp!ementation# and
manuacturing processes. ,ontro! is a!so imposed by the management unctions in an organi(ation>the
,ha!!enger accident invo!ved inade%uate contro!s in the !aunch&decision process# or e$amp!e>and by
the socia! and po!itica! system "ithin "hich the organi(ation e$ists. The ro!e o a!! o these actors must
be considered in ha(ard and ris' ana!ysis. Note that the use o the term @contro!C does not imp!y a strict mi!itary&sty!e command and contro!
structure. Behavior is contro!!ed or in!uenced not on!y by direct management intervention# but a!soindirect!y by po!icies# procedures# shared va!ues# and other aspects o the organi(ationa! cu!ture. A!!
behavior is in!uenced and at !east partia!!y @contro!!edC by the socia! and organi(ationa! conte$t in "hich
the behavior occurs. Engineering this conte$t can be an eective "ay o creating and changing a saety
cu!ture.
Systems are vie"ed in STAM< as interre!ated components that are 'ept in a state o dynamic
e%ui!ibrium by eedbac' !oops o inormation and contro!. A system is not treated as a static design# but as
a dynamic process that is continua!!y adapting to achieve its ends and to react to changes in itse! and its
environment. The origina! design must not on!y enorce appropriate constraints on behavior to ensure sae
operation# but it must continue to operate sae!y as changes and adaptations occur over time. Accidents#
then# are considered to resu!t rom dysunctiona! interactions among the system components +inc!uding
both the physica! system components and the organi(ationa! and human components that vio!ate the
system saety constraints. The process !eading up to an accident can be described in terms o an adaptiveeedbac' unction that ai!s to maintain saety as perormance changes over time to meet a comp!e$ set o
goa!s and va!ues. The accident or !oss itse! resu!ts not simp!y rom component ai!ure +"hich is treated as
a symptom o the prob!ems but rom inade%uate contro! o saety&re!ated constraints on the deve!opment#
design# construction# and operation o the socio&technica! system. 0hi!e events re!ect the effectso dysunctiona! interactions and inade%uate enorcement o saety
constraints# the inade%uate contro! itse! is on!y indirect!y re!ected by the events>the events are the
result o the inade%uate contro!. The system contro! structure itse!# thereore# must be e$amined to
determine ho" unsae events might occur and i the contro!s are ade%uate to maintain the re%uired
constraints on sae behavior.
STAM< has three undamenta! concepts: constraints# hierarchica! !eve!s o contro!# and process
mode!s. These concepts# in turn# give rise to a c!assiication o contro! !a"s that can !ead to accidents.
The most basic component o STAM< is not an event# but a constraint. In systems theory and contro!theory# systems are vie"ed as hierarchica! structures "here each !eve! imposes constraints on the activity
o the !eve! be!o" it>that is# constraints or !ac' o constraints at a higher !eve! a!!o" or contro! !o"er&
!eve! behavior.
Saety&re!ated constraints speciy those re!ationships among system variab!es that constitute the non&
ha(ardous or sae system states>or e$amp!e# the po"er must never be on "hen the access to the high&
vo!tage po"er source is open# the descent engines on the !ander must remain on unti! the spacecratreaches the p!anet surace# and t"o aircrat must never vio!ate minimum separation re%uirements.
Instead o vie"ing accidents as the resu!t o an initiating +root cause event in a chain o events !eading
to a !oss# accidents are vie"ed as resu!ting rom interactions among components that vio!ate the system
saety constraints. The contro! processes that enorce these constraints must !imit system behavior to thesae changes and adaptations imp!ied by the constraints.
-
8/11/2019 Safety in Integrated Systems
8/21
Figure *! +eneral Form of a Model of Socio#Technical Safety "ontrol
The mode! in 9igure has t"o basic hierarchica! contro! structures>one or system deve!opment
+on the !et and one or system operation +on the right>"ith interactions bet"een them. A spacecrat
manuacturer# or e$amp!e# might on!y have system deve!opment under its immediate contro!# but saetyinvo!ves both deve!opment and operationa! use o the spacecrat# and neither can be accomp!ished
successu!!y in iso!ation: Saety must be designed into the physica! system# and saety during operation
depends part!y on the origina! system design and part!y on eective contro! over operations.
Manuacturers must communicate to their customers the assumptions about the operationa! environmentupon "hich their saety ana!ysis and design "as based# as "e!! as inormation about sae operating
-
-
8/11/2019 Safety in Integrated Systems
9/21
procedures. The operationa! environment# in turn# provides eedbac' to the manuacturer about the
perormance o the system during operations.
Bet"een the hierarchica! !eve!s o each contro! structure# eective communication channe!s are
needed# both a do"n"ard referencechanne! providing the inormation necessary to impose constraints on
the !eve! be!o" and a measuringchanne! to provide eedbac' about ho" eective!y the constraints "ereenorced. 9or e$amp!e# company management in the deve!opment process structure may provide a saety
po!icy# standards# and resources to pro4ect management and in return receive status reports# ris'assessment# and incident reports as eedbac' about the status o the pro4ect "ith respect to the saety
constraints.
The saety contro! structure oten changes over time# "hich accounts or the observation that accidents
in comp!e$ systems re%uent!y invo!ve a migration o the system to"ard a state "here a sma!! deviation
+in the physica! system or in human behavior can !ead to a catastrophe. The oundation or an accident is
oten !aid years beore. Dne event may trigger the !oss# but i that event had not happened# another one
"ou!d have. As an e$amp!e# 9igure 6 sho"s the changes over time that !ed to a "ater contamination
accident in ,anada "here 56// peop!e became i!! and 7 died +most o them chi!dren. The reasons "hy
this accident occurred "ou!d ta'e too many pages to e$p!ain and on!y a sma!! part o the overa!! STAM