Safety in Integrated Systems

download Safety in Integrated Systems

of 21

Transcript of Safety in Integrated Systems

  • 8/11/2019 Safety in Integrated Systems

    1/21

    Safety in Integrated Systems Health Engineering and Management1

    Nancy G. Leveson

    Aeronautics and Astronautics

    Engineering SystemsMIT

    Abstract: This paper describes the state o the art in system saety engineering and management a!ong

    "ith ne" mode!s o accident causation# based on systems theory# that may a!!o" us to great!y e$pand the

    po"er o the techni%ues and too!s "e use. The ne" mode!s consider hard"are# sot"are# humans#

    management decision&ma'ing# and organi(ationa! design as an integrated "ho!e. Ne" ha(ard ana!ysis

    techni%ues based on these e$panded mode!s o causation provide a means or obtaining the inormation

    necessary to design saety into the system and to determine "hich are the most critica! parameters to

    monitor during operations and ho" to respond to them. The paper irst describes and contrasts the current

    system saety and re!iabi!ity engineering approaches to saety and the traditiona! methods used in both

    these ie!ds. It then out!ines the ne" systems&theoretic approach being deve!oped in Europe and the ).S.

    and the app!ication o the ne" approach to aerospace systems# inc!uding a recent ris' ana!ysis and hea!th

    assessment o the NASA manned space program management structure and saety cu!ture that used the

    ne" approach.

    Reliability Engineering vs. System Safety

    A!though saety has been a concern o engineering or centuries and reerences to designing or saety

    appear in the 1*thcentury +,ooper# 1-*1# modern approaches to engineering saety appeared as a resu!t

    o changes that occurred in the mid&1*//s.

    In the years o!!o"ing 0or!d 0ar II# the gro"th in mi!itary e!ectronics gave rise to re!iabi!ity

    engineering. e!iabi!ity "as a!so important to NASA and our space eorts# as evidenced by the high

    ai!ure rate o space missions in the !ate 1*2/s and ear!y 1*3/s and "as adopted as a basic approach to

    achieving mission success. e!iabi!ity engineering is concerned primari!y "ith ai!ures and ai!ure rate

    reduction. The re!iabi!ity engineering approach to saety thus concentrates on ai!ure as the cause o

    accidents. A variety o techni%ues are used to minimi(e component ai!ures and thereby the ai!ures ocomp!e$ systems caused by component ai!ure. These re!iabi!ity engineering techni%ues inc!ude para!!e!

    redundancy# standby sparing# bui!t&in saety actors and margins# derating# screening# and timed

    rep!acements.

    System Saety arose around the same time# primari!y in the deense industry. A!though many o the

    basic concepts o system saety# such as anticipating ha(ards and accidents and bui!ding in saety# predate

    the post&0or!d 0ar II period# much o the ear!y deve!opment o system saety as a separate discip!ine

    began "ith !ight engineers immediate!y ater the 0ar and "as deve!oped into a mature discip!ine in the

    ear!y ba!!istic missi!e programs o the 1*2/s and 1*3/s. The space program "as the second ma4or

    app!ication area to app!y system saety approaches in a discip!ined manner. Ater the Apo!!o 5/6 ire in

    1*37# NASA hired 8erome Lederer# the head o the 9!ight Saety 9oundation# to head manned space&!ight

    saety and# !ater# a!! saety activities. Through his !eadership# an e$tensive program o system saety "as

    estab!ished or space pro4ects# much o it patterned ater the Air 9orce and o programs. 0hi!e traditiona! re!iabi!ity engineering techni%ues are oten eective in increasing re!iabi!ity# they do

    not necessari!y increase saety. In act# their use under some conditions may actua!!y reduce saety. 9or

    e$amp!e# increasing the burst&pressure to "or'ing&pressure ratio o a tan' oten introduces ne" dangers

    o an e$p!osion or chemica! reaction in the event o a rupture.

    1This paper is a drat paper or the NASA Ames Integrated System ;ea!th Engineering and Management 9orum#

    November 5//2. The research reported in the paper "as part!y supported by the ,enter or

  • 8/11/2019 Safety in Integrated Systems

    2/21

    System saety# in contrast to the re!iabi!ity engineering ocus on preventing ai!ure# is primari!y

    concerned "ith the management o ha(ards: their identiication# eva!uation# e!imination# and contro!

    through ana!ysis# design and management procedures. In the case o the tan' rupture# system saety "ou!d

    !oo' at the interactions among the system components rather than 4ust at ai!ures or engineering strengths.

    e!iabi!ity engineers oten assume that re!iabi!ity and saety are synonymous# but this assumption istrue on!y in specia! cases. In genera!# saety has a broader scope than ai!ures# and ai!ures may not

    compromise saety. There is obvious!y an over!ap bet"een re!iabi!ity and saety# but many accidentsoccur "ithout any component ai!ure# e.g.# the Mars

  • 8/11/2019 Safety in Integrated Systems

    3/21

    Techni%ues such as au!t tree ana!ysis or ai!ure modes and eects ana!ysis attempt to de!ineate a!! the

    re!evant chains o events that can !ead to the !oss event being investigated.

    The event chain mode!s that resu!t rom this ana!ysis orm the basis or most re!iabi!ity and saety

    engineering design techni%ues# such as redundancy# overdesign# saety margins# inter!oc's# etc. The

    designer attempts to interrupt the chain o events by preventing the occurrence o events in the chain.9igure 1 is annotated "ith some potentia! design techni%ues that might be used to interrupt the !o" o

    events to the !oss state. Not sho"n# but very common# is to introduce @andC re!ationships in the chain# i.e.#to design such that t"o or more ai!ures must happen or the chain to continue to"ard the !oss state# thus

    reducing the probabi!ity o the !oss occurring.

    Figure ! "hain#of#Events Accident Model E$ample

    e!iabi!ity engineering re!ies on redundancy# increasing component integrity +e.g.# incorporating saety

    margins or physica! components and attempting to achieve error&ree behavior o the !ogica! and human

    components and using @saetyC unctions during operations to recover rom ai!ures +e.g.# shutdo"n andother types o protection systems. System saety uses many o the same techni%ues# but ocuses them on

    e!iminating or preventing ha(ards. System saety engineers a!so tend to use a "ider variety o design

    approaches# inc!uding various types o inter!oc's to prevent the system rom entering a ha(ardous state or

    !eaving a sae state.

    In summary# re!iabi!ity engineering ocuses on ai!ures "hi!e system saety ocuses on ha(ards. These

    are not e%uiva!ent. ,.D.Mi!!er# o the ounders o system saety in the 1*2/s# cautioned that

    @distinguishing ha(ards rom ai!ures is imp!icit in understanding the dierence bet"een saety and

    re!iabi!ityC +Mi!!er# 1*-2. 0hi!e both o these approaches "or' "e!! "ith respect to their dierent goa!s or the re!ative!y simp!e

    systems or "hich they "ere designed in the 1*2/ and 1*3/s# they are not as eective or the comp!e$

    systems and ne" techno!ogy common today. The basic ha(ard ana!ysis techni%ues have not changed

    signiicant!y since 9au!t Tree Ana!ysis "as introduced in 1*35. The introduction o digita! systems and

    sot"are contro!# in particu!ar# has had a proound eect in terms o techno!ogy that does not satisy the

    under!ying assumptions o the traditiona! ha(ard and re!iabi!ity ana!ysis and saety engineeringtechni%ues. It a!so a!!o"s !eve!s o comp!e$ity in our system designs that over"he!ms the traditiona!

    approaches +Leveson 1**2# Leveson 5//2. A re!ated ne" deve!opment is the introduction o distributed

  • 8/11/2019 Safety in Integrated Systems

    4/21

    human&automation contro! and the changing ro!e o human operators rom contro!!er to high&!eve!

    supervisor and decision&ma'er. The simp!e s!ips and operationa! mista'es o the past are being e!iminated

    by substituting automation# resu!ting in a change in the ro!e humans p!ay in accidents and the substitution

    o cognitive!y comp!e$ decision&ma'ing errors or the simp!e s!ips o the past. In the most technica!!y

    advanced aircrat# the types o errors pi!ots ma'e have changed but not been e!iminated +Bi!!ings# 1**3. Another important !imitation o the chain&o&events mode! is that it ignores the socia! and

    organi(ationa! actors in accidents. Both the ,ha!!enger and ,o!umbia accident reports stressed theimportance o these actors in accident causation.

  • 8/11/2019 Safety in Integrated Systems

    5/21

  • 8/11/2019 Safety in Integrated Systems

    6/21

    Figure '. Rasmussen Socio#Technical Model for Ris( Management

    STAM)

    In STAMit misinterpreted

    noise rom a sensor as an indication the spacecrat had reached the surace o the p!anet.

    Accidents such as these# invo!ving engineering design errors and misunderstanding o the unctiona!

    re%uirements +in the case o the Mars

  • 8/11/2019 Safety in Integrated Systems

    7/21

    deve!opment process# i.e.# ris' is not ade%uate!y managed in the design# imp!ementation# and

    manuacturing processes. ,ontro! is a!so imposed by the management unctions in an organi(ation>the

    ,ha!!enger accident invo!ved inade%uate contro!s in the !aunch&decision process# or e$amp!e>and by

    the socia! and po!itica! system "ithin "hich the organi(ation e$ists. The ro!e o a!! o these actors must

    be considered in ha(ard and ris' ana!ysis. Note that the use o the term @contro!C does not imp!y a strict mi!itary&sty!e command and contro!

    structure. Behavior is contro!!ed or in!uenced not on!y by direct management intervention# but a!soindirect!y by po!icies# procedures# shared va!ues# and other aspects o the organi(ationa! cu!ture. A!!

    behavior is in!uenced and at !east partia!!y @contro!!edC by the socia! and organi(ationa! conte$t in "hich

    the behavior occurs. Engineering this conte$t can be an eective "ay o creating and changing a saety

    cu!ture.

    Systems are vie"ed in STAM< as interre!ated components that are 'ept in a state o dynamic

    e%ui!ibrium by eedbac' !oops o inormation and contro!. A system is not treated as a static design# but as

    a dynamic process that is continua!!y adapting to achieve its ends and to react to changes in itse! and its

    environment. The origina! design must not on!y enorce appropriate constraints on behavior to ensure sae

    operation# but it must continue to operate sae!y as changes and adaptations occur over time. Accidents#

    then# are considered to resu!t rom dysunctiona! interactions among the system components +inc!uding

    both the physica! system components and the organi(ationa! and human components that vio!ate the

    system saety constraints. The process !eading up to an accident can be described in terms o an adaptiveeedbac' unction that ai!s to maintain saety as perormance changes over time to meet a comp!e$ set o

    goa!s and va!ues. The accident or !oss itse! resu!ts not simp!y rom component ai!ure +"hich is treated as

    a symptom o the prob!ems but rom inade%uate contro! o saety&re!ated constraints on the deve!opment#

    design# construction# and operation o the socio&technica! system. 0hi!e events re!ect the effectso dysunctiona! interactions and inade%uate enorcement o saety

    constraints# the inade%uate contro! itse! is on!y indirect!y re!ected by the events>the events are the

    result o the inade%uate contro!. The system contro! structure itse!# thereore# must be e$amined to

    determine ho" unsae events might occur and i the contro!s are ade%uate to maintain the re%uired

    constraints on sae behavior.

    STAM< has three undamenta! concepts: constraints# hierarchica! !eve!s o contro!# and process

    mode!s. These concepts# in turn# give rise to a c!assiication o contro! !a"s that can !ead to accidents.

    The most basic component o STAM< is not an event# but a constraint. In systems theory and contro!theory# systems are vie"ed as hierarchica! structures "here each !eve! imposes constraints on the activity

    o the !eve! be!o" it>that is# constraints or !ac' o constraints at a higher !eve! a!!o" or contro! !o"er&

    !eve! behavior.

    Saety&re!ated constraints speciy those re!ationships among system variab!es that constitute the non&

    ha(ardous or sae system states>or e$amp!e# the po"er must never be on "hen the access to the high&

    vo!tage po"er source is open# the descent engines on the !ander must remain on unti! the spacecratreaches the p!anet surace# and t"o aircrat must never vio!ate minimum separation re%uirements.

    Instead o vie"ing accidents as the resu!t o an initiating +root cause event in a chain o events !eading

    to a !oss# accidents are vie"ed as resu!ting rom interactions among components that vio!ate the system

    saety constraints. The contro! processes that enorce these constraints must !imit system behavior to thesae changes and adaptations imp!ied by the constraints.

  • 8/11/2019 Safety in Integrated Systems

    8/21

    Figure *! +eneral Form of a Model of Socio#Technical Safety "ontrol

    The mode! in 9igure has t"o basic hierarchica! contro! structures>one or system deve!opment

    +on the !et and one or system operation +on the right>"ith interactions bet"een them. A spacecrat

    manuacturer# or e$amp!e# might on!y have system deve!opment under its immediate contro!# but saetyinvo!ves both deve!opment and operationa! use o the spacecrat# and neither can be accomp!ished

    successu!!y in iso!ation: Saety must be designed into the physica! system# and saety during operation

    depends part!y on the origina! system design and part!y on eective contro! over operations.

    Manuacturers must communicate to their customers the assumptions about the operationa! environmentupon "hich their saety ana!ysis and design "as based# as "e!! as inormation about sae operating

    -

  • 8/11/2019 Safety in Integrated Systems

    9/21

    procedures. The operationa! environment# in turn# provides eedbac' to the manuacturer about the

    perormance o the system during operations.

    Bet"een the hierarchica! !eve!s o each contro! structure# eective communication channe!s are

    needed# both a do"n"ard referencechanne! providing the inormation necessary to impose constraints on

    the !eve! be!o" and a measuringchanne! to provide eedbac' about ho" eective!y the constraints "ereenorced. 9or e$amp!e# company management in the deve!opment process structure may provide a saety

    po!icy# standards# and resources to pro4ect management and in return receive status reports# ris'assessment# and incident reports as eedbac' about the status o the pro4ect "ith respect to the saety

    constraints.

    The saety contro! structure oten changes over time# "hich accounts or the observation that accidents

    in comp!e$ systems re%uent!y invo!ve a migration o the system to"ard a state "here a sma!! deviation

    +in the physica! system or in human behavior can !ead to a catastrophe. The oundation or an accident is

    oten !aid years beore. Dne event may trigger the !oss# but i that event had not happened# another one

    "ou!d have. As an e$amp!e# 9igure 6 sho"s the changes over time that !ed to a "ater contamination

    accident in ,anada "here 56// peop!e became i!! and 7 died +most o them chi!dren. The reasons "hy

    this accident occurred "ou!d ta'e too many pages to e$p!ain and on!y a sma!! part o the overa!! STAM