The ETTO Principle - Efficiency-Thoroughness Trade-Off · On 04:30 February 16, 2004, a Polish...

31
© Erik Hollnagel, 2010 The ETTO Principle - Efficiency-Thoroughness Trade-Off Or Why Things That Go Right, Sometimes Go Wrong Professor & Industrial Safety Chair MINES ParisTech, CRC Sophia Antipolis, France Professor II NTNU Trondheim, Norway Erik Hollnagel E-mail: [email protected]

Transcript of The ETTO Principle - Efficiency-Thoroughness Trade-Off · On 04:30 February 16, 2004, a Polish...

© Erik Hollnagel, 2010

The ETTO Principle -Efficiency-Thoroughness Trade-Off

OrWhy Things That Go Right, Sometimes Go Wrong

Professor & Industrial Safety ChairMINES ParisTech, CRC

Sophia Antipolis, France

Professor IINTNUTrondheim, Norway

Erik Hollnagel

E-mail: [email protected]

© Erik Hollnagel, 2010

Three approaches to safety

2000195019001850

Age of technology

Age of human factors

Age of safety management

Design, automationElimination, protection

Human reliability

HRO, Safety culture

Engineering approach

Human Factors approach

Organisational approach

© Erik Hollnagel, 2010

Maritime accidents

Officer (27%)Other (10%)Pilot (7%)Crew (16%)Shore control (10%)

Main causes of maritime accidents (1987-1991)

Structural errorShore control errorCrew errorPilot errorOfficer errorOtherEquipment and mechanical error

Total “human error”: 70%

© Erik Hollnagel, 2010

A persistent myth

© Erik Hollnagel, 2010

Failures or successes?

Who or what are responsible for the remaining 10-20%?

When something goes right, e.g., 9.999 events out of 10.000, are humans also responsible in 80-90% of

the cases?

When something goes wrong, e.g., 1 event out of 10.000

(10E-4), humans are assumed to be responsible in 80-90% of

the cases.

Who or what are responsible for the remaining 10-20%?

Investigation of failures is accepted as important.

Investigation of successes is rarely undertaken.

© Erik Hollnagel, 2010

Nature of technical (formal) systems

They can be described bottom-up in terms of

components and subsystems.

Risks and failures can therefore be analysed relative to individual

components and events.

Decomposition works for technical systems, because

they have been designed.

Output (effects) are proportional to input (causes) and predictable from knowledge of the components. Technical systems are linear and event outcomes are tractable.

Many systems are

identical

© Erik Hollnagel, 2010

AB

Today’s systems are socio-technical

Risks cannot be assigned to identifiable parts of the systems’ structure. Risks must be seen in relation to the systems’ functions.

The conditions for successful functioning are created by the interaction

between social and technical factors.

© Erik Hollnagel, 2010

Nature of socio-technical systems

Must be described top-down in terms of

functions and objectives.

Risks and failures must therefore be described

relative to functional wholes.

Decomposition does not work for socio-technical systems, because they are emergent.

Complex relations between input (causes) and output (effects) give rise to unexpected and disproportionate consequences. Socio-technical systems are

non-linear and event outcomes are intractable.

All systems unique

© Erik Hollnagel, 2010

Tractable and intractable systems

Principles of functioning are known

System does not change while being described

Description are simple with few details

Tractable system(technical, clockwork)

Intractable system(interdependent, teamwork)

Principles of functioning are partly unknown

System changes before description is completed

Description are elaborate with many details

Comprehensibility

Underspecified

Stability

Complicacy

Fully specified Partly specified

Intractable system(socio-technical, teamwork)

Principles of functioning are partly unknown

System changes before description is completed

Description are elaborate with many details

Underspecified

© Erik Hollnagel, 2010

Performance variability is necessary

Most socio-technical systems are intractable. Conditions of work therefore never completely match what has been specified or prescribed.

Systems are so complex that work situations always are underspecified – hence partly unpredictable

Few – if any – tasks can successfully be carried out unless procedures and tools are adapted to the situation. Performance variability is both normal and necessary. ?

Individuals, groups, and organisations habitually adjust their work to meet existing conditions (resources, demands, conflicts, interruptions).

Resources (time, manpower, information, etc.) always are limited. Adjustments will therefore be approximate rather than exact.

Performance variability

Acceptable outcomes

Unacceptable outcomes

© Erik Hollnagel, 2010

Low Low Low Moderate

Risk profile (= lack of safety)

Very critical

Critical

Marginal

Negligible

Cons

eque

nce

ProbabilityCertainLikelyPossibleUnlikelyRare

Low Low Moderate High High

Moderate Moderate High High Extreme

High Extreme Extreme ExtremeHigh

Moderate

© Erik Hollnagel, 2010

Low

Benefit profile (= safety)

Innovative

Effective

Acceptable

Negligible

Cons

eque

nce

ProbabilityRare

Low

Moderate

High

Low

Unlikely

Low

Moderate

High

Possible

Very high

High

Low

Moderate

Moderate

Likely

High

Very high

High

Moderate

Certain

High

Very high

Very high

© Erik Hollnagel, 2010

Normal outcomes(things that go

right)

Serendipity

Very highPredictability

Range of event outcomes

Outcome

Very low

Posi

tive

Nega

tiv e

Neut

ral

Disasters

Near misses

Accidents

Incidents

Good luck

Mishaps

© Erik Hollnagel, 2010

Normal outcomes(things that go

right)

Serendipity

Very high Predictability

Frequency of event outcomes

Outcome

Very low

Posi

tive

Nega

tiv e

Neut

ral

Disasters

Accidents

Good luck

Mishaps

IncidentsNear misses

102

104

106

© Erik Hollnagel, 2010

Why only look at what goes wrong?

Focus is on what goes wrong. Look for failures and malfunctions. Try to eliminate causes and improve barriers.

Focus is on what goes right. Use that to

understand normal performance, to do

better and to be safer.

Safety = Reduced number of adverse events.

10-4 := 1 failure in 10.000 events

1 - 10-4 := 9.999 non-failures in 10.000 events

Safety and core business help each other.

Learning uses most of the data available

Safety and core business compete for resources. Learning only uses a fraction of the data available

Safety = Ability to succeed under varying

conditions.

© Erik Hollnagel, 2010

Normal outcomes(things that go

right)

Serendipity

Very high Predictability

Being safe versus being unsafe

Outcome

Very low

Posi

tive

Nega

tiv e

Neut

ral

Disasters

Accidents

Good luck

Mishaps

IncidentsNear misses

102

104

106

Safe Safe FunctioningFunctioning

(invisible)(invisible)

Unsafe Unsafe FunctioningFunctioning

(visible)(visible)

© Erik Hollnagel, 2010

If thoroughness dominates, there may be too little time

to carry out the actions.

If efficiency dominates, actions may be badly

prepared or wrong

Neglect pending actionsMiss new events

Miss pre-conditionsLook for expected results

Thoroughness: Time to thinkRecognising situation.Choosing and planning.

Efficiency: Time to doImplementing plans. Executing actions.

Efficiency-Thoroughness Trade-Off

Time & resources needed

Time & resources available

© Erik Hollnagel, 2010

No time (or resources) to do it now

Some ETTO heuristics

Looks fineNot really important

Normally OK, no need to check

Will be checked by someone else

Can’t remember how to do it We always do it this way

Idiosyncratic (work related)

Has been checked by someone else

Cognitive (individual)

Judgement under uncertainty

Cognitive primitives (SM – FG)

Reactions to information input

overload and underload

Cognitive style

Collective (organisation)

Negative reporting

Reduce redundancy

Meet “production” targetsReduce

unnecessary cost

Double-bind

We must get this doneMust be ready in time

Must not use too much of X

I’ve done it millions of time before

This way is much quicker

It looks like X (so it probably is X)Reject conflicting

information

Confirmation bias

© Erik Hollnagel, 2010

Thoroughness takes time

“In splitting a board, a circular-saw operator suffered the loss of his thumb when, in violation of

instructions, he pushed the board past the saw with his fingers, instead of using the push stick that had

been provided for the purpose.”

“He stated that he had always done such work in this manner and had never before been hurt. He had performed

similar operations on an average of twenty times a day for three months and had therefore exposed his hand in

this way over one thousand five hundred times.”(Heinrich, 1931 “Industrial accident prevention”)

Grab board

Put on saw table

Grab push stick

Put on board

Push past saw

Remove push stick

Remove board

© Erik Hollnagel, 2010

Pecking for foodIn many animals the conflict between scanning for

predators and foraging appears clear because vigilance and foraging both require time and visual attention, which

are both limited resources

Head-down (foraging):

Head-up (vigilant):

Maximum

Zero

High

Low

Food Risk

© Erik Hollnagel, 2010

Efficiency-Thoroughness Trade-Off

One way of managing time and resource limitations is to

think only one step back and/or one step ahead.

Confirm that input is correct Thoroughness

Efficiency Trust that input is correct

Consider secondary outcomes and side-

effects

Assume someone else takes care of

outcomes

For distributed work it is necessary to trust what others do; it is impossible

to check everything.

© Erik Hollnagel, 2010

Mutual optimism

I can allow myself to be

effective because the others will be

thorough

I can allow myself to be

effective because the others will be

thorough

I can allow myself to be

effective because the others will be

thorough

I can allow myself to be

effective because the others will be

thorough

I can allow myself to be

effective because the others will be

thorough

I can allow myself to be

effective because the others will be

thorough

© Erik Hollnagel, 2010

Individual ETTO

ThoroughnessRegister prescription in computer system by drug name before fetching drug from supply.

EfficiencyAvoid letting customer wait by fetching drug from supply, and registering it later.

On a cold January afternoon, a customer handed in a prescription at a drug store. Workload was high due to illness among staff, many customers (20-30) were waiting (30-40 minutes), and there was only one computer terminal available.

Customer received the wrong drug (Estradiol instead of Efedrin), but the mistake was realised before drug was used …and the pharmacist was blamed for not following procedures.

Verify drug and dose via customer dialogue.

Customer was irritated because of long wait and declined.

© Erik Hollnagel, 2010

Collective ETTO: Stena Nautica

ConditionTo prevent foundering after a collision, the ship was divided into sections with watertight doors between them. ThoroughnessOpen and close the watertight doors every time you pass through them.

EfficiencyLeave some doors open to make passage easier.

On 04:30 February 16, 2004, a Polish cargo ship “Joanna” rammed a Swedish ferry “Stena Nautica” on the starboard side, after a failed evasion manoeuvre The collision created an opening into the cargo deck and engine room. Due to engine failure the vessel was abandoned, but later towed safely into Falkenberg.

On “Stena Nautica”, nine watertight doors were open. This had incorrectly been permitted by a ship’s inspector. It was later found that nine ferries worked in the same way. The system for automatic door closing failed because it was activated late and because the electrical system was not waterproof.

Consequence

© Erik Hollnagel, 2010

Organisational ETTO

Hurricane Camille (1969) knocked two platforms to the sea floor, moved an adjacent platform down slope. The platform was declared a constructive loss and an insurance

claim was made. All of the platforms, however, were salvaged.

An initial evaluation indicated the potential for movements of the sea floor. A study was initiated in to investigate this phenomenon and evaluate the risks. Design of conventional platforms was commissioned on the premise that if the likelihood of sea floor movements was found to be too high, the platform would be sited elsewhere.Thoroughness

Do not complete design or begin construction before study is completed.Follow recommendation not to site the conventional platforms at the proposed location.

EfficiencyStart construction of platforms so project will be on schedule if study indicates that the site is safe.Dispute there is sufficient evidence to make recommendation valid. Assume site is stable until proven to be unstable.

© Erik Hollnagel, 2010

ETTO – successes and failures

In the vast majority of cases, things go right (outcome is as expected). This is taken for granted and therefore rarely analysed or investigated.

Sometimes things go wrong (outcome is not as expected). If losses (time, material, money or life) are serious, this is investigated to find the cause.

Things go right and things go wrong for the same reasons. Efficiency-thoroughness trade-offs are both normal and necessary.

An ETTO is always approximate, because of the very reasons that make it necessary (insufficient time and information!). But making an efficiency-thoroughness trade-off is never wrong in itself!

People are expected to be both efficient and thorough at the same time – or rather to be thorough, when with hindsight it was wrong to be efficient.

The greater the need of performance adjustments is, the less precise they are likely to be. Demands to efficiency may overrule thoroughness.

© Erik Hollnagel, 2010

Reasons for ETTOing (example)

© Erik Hollnagel, 2010

The ETTO -- TETO paradox

Thoroughness

Efficiency

Thoroughness

Efficiency

Learning and preparing indicators

& responses

In order to be resilient, short-term ETTO must be balanced by long-term TETO (Thoroughness-Efficiency Trade-Off)

Efficiency in the present requires thoroughness in

the past.

Efficiency in the future requires thoroughness in

the present.Learning and

preparing indicators & responses

© Erik Hollnagel, 2010

Solution: Enhance the abilities to respond,

monitor, anticipate and learn

Two approaches to safety management

Things that go wrong

Things that go right

SafeUnsafe

The goal of resilience management is to

increase the number of things that go right.

The goal of safety management is to

reduce the number of things that go wrong.

Solution: Constrain performance by rules, procedures, barriers,

and defences.

© Erik Hollnagel, 2010

Curious to learn more?

A Practical Introduction to the Functional Resonance Analysis Method (FRAM) 2-days Course at NTNU, Trondheim, Norway

20th - 21th September 2010

[email protected]@sintef.no

Resilience engineering and the Functional Resonance Analysis Method (FRAM)St. Andrews University. Thursday August 26

© Erik Hollnagel, 2010

Any questions?(before it is too late)

Thank you for your attention