Common Operations Failure Modes in the Process Industries...

30
Common Operations Failure Modes in the Process Industries Common Operations Failure Common Operations Failure Modes in the Process Industries Modes in the Process Industries ASM® and Abnormal Situation Management® are registered trademarks of Honeywell International Dr. Peter Bullemer Dr. Peter Bullemer Human Centered Solutions Human Centered Solutions Jason Laberge Jason Laberge Honeywell October 27-28, 2008 College Station, TX USA 2009 International 2009 International Symposium Symposium Beyond Regulatory Compliance, Beyond Regulatory Compliance, Making Safety Second Nature Making Safety Second Nature ASM Paper presented on behalf of the Abnormal Situation Management® R&D Consortium

Transcript of Common Operations Failure Modes in the Process Industries...

Page 1: Common Operations Failure Modes in the Process Industries …psc.tamu.edu/files/symposia/2009/presentations/2 Bullemer.pdf · Identify Common Failures. Identify Common Root Causes.

Common Operations Failure Modes in the Process Industries

Common Operations Failure Common Operations Failure Modes in the Process IndustriesModes in the Process Industries

ASM® and Abnormal Situation Management® are registered trademarks of Honeywell International

Dr. Peter BullemerDr. Peter BullemerHuman Centered SolutionsHuman Centered SolutionsJason LabergeJason LabergeHoneywellOctober 27-28, 2008College Station, TX USA

2009 International 2009 International SymposiumSymposium

Beyond Regulatory Compliance, Beyond Regulatory Compliance, Making Safety Second NatureMaking Safety Second Nature

ASM Paper presented on behalf of the Abnormal Situation Management® R&D Consortium

Page 2: Common Operations Failure Modes in the Process Industries …psc.tamu.edu/files/symposia/2009/presentations/2 Bullemer.pdf · Identify Common Failures. Identify Common Root Causes.

Page 2

ASM

2

• Dr. Peter Bullemer– Senior partner, North American, human

factors consulting group, Human Centered Solutions, LLP

– Specializes in human performance in process industry operations

– Technical Contributor to the ASM® Consortium since 1994

• Jason Laberge– Principal Investigator for the Abnormal

Situation Management® (ASM®) Consortium– Lead human factors researcher for ASM

since 2005– Research focuses on understanding the

factors that influence performance in complex systems

AuthorsAuthors

Page 3: Common Operations Failure Modes in the Process Industries …psc.tamu.edu/files/symposia/2009/presentations/2 Bullemer.pdf · Identify Common Failures. Identify Common Root Causes.

Page 3

ASM Abnormal Situation Management® A Joint Research and Development Consortium

Abnormal Situation Management® A Joint Research and Development Consortium

Human Centered SolutionsHuman Centered SolutionsHelping People PerformHelping People Perform

Founded in 1994

Creating a new paradigm for the operation of complex industrial plants, with solution concepts that improve Operations’ ability to prevent and respond to abnormal situations.

www.asmconsortium.org

Page 4: Common Operations Failure Modes in the Process Industries …psc.tamu.edu/files/symposia/2009/presentations/2 Bullemer.pdf · Identify Common Failures. Identify Common Root Causes.

Page 4

ASM MessageMessage• The typical approach to incident analysis

does NOT effectively identify the impact of ineffective operations practices

• This paper illustrates a methodology to identify systemic operations practice failure modes

• And improve human reliability associated with plant operations practices

Page 5: Common Operations Failure Modes in the Process Industries …psc.tamu.edu/files/symposia/2009/presentations/2 Bullemer.pdf · Identify Common Failures. Identify Common Root Causes.

Page 5

ASM Project ObjectivesProject Objectives

• Understand relation between ineffective operations practices and process industry incidents–Systematically analyze incidents to determine

common operational practice failure modes –Identify root causes of common operational

practice failure modes–Why do failures occur ACROSS incidents

This research study was sponsored by the Abnormal Situation Management® (ASM®) Consortium.

Page 6: Common Operations Failure Modes in the Process Industries …psc.tamu.edu/files/symposia/2009/presentations/2 Bullemer.pdf · Identify Common Failures. Identify Common Root Causes.

Page 6

ASM

0

20

40

60

80

100

USACan

ada UK

Korea

India

Other N

on-U

SAGerm

any

Algeria

Austra

liaBraz

ilFran

ce Italy

Kuwait

Mexico

# of

Inci

dent

s

0%

20%

40%

60%

80%

100%

Cum

ulat

ive

%

Incident SelectionIncident Selection

• Identified 123 candidate incidents (99 public, 24 site)

• Priority given to recent refining/chemical incidents with severe consequences and detailed reports

• Selected 32 incidents for the study

Public Site TotalUSA 14 7 21

Non USA 6 5 11Total 20 12 32

Page 7: Common Operations Failure Modes in the Process Industries …psc.tamu.edu/files/symposia/2009/presentations/2 Bullemer.pdf · Identify Common Failures. Identify Common Root Causes.

Page 7

ASM Operational FailuresOperational Failures• Failure is any operational practice flaw that,

if corrected, could have prevented the incident from occurring or would have significantly mitigated its consequences

– What’ went wrong in the specific incident in the investigation team’s own language/terms

– Example: Supervisor not accessible• Common failure modes are shared operational

practice failures across incidents– Common problems for the industry (or site)– Failures map to ASM Effective Operations Practices

Guidelines– Example: Ineffective first line leadership roles

Page 8: Common Operations Failure Modes in the Process Industries …psc.tamu.edu/files/symposia/2009/presentations/2 Bullemer.pdf · Identify Common Failures. Identify Common Root Causes.

Page 8

ASM Common Failure ModesCommon Failure ModesTop 10 Operations Failures # %

Hazard analysis/ communication 79 15%First-line leadership 65 12%Continuous improvement 60 11%Safety culture 36 7%Initial and refresher training 30 6%Task communications 29 5%Comprehensive MOC 28 5%Cross functional communication 23 4%Compliance with procedures 15 3%Design guidelines and standards 14 3%Other failure modes 160 30%

TOTAL 539

• Top 10 covers 70% of identified operations practice failures

Page 9: Common Operations Failure Modes in the Process Industries …psc.tamu.edu/files/symposia/2009/presentations/2 Bullemer.pdf · Identify Common Failures. Identify Common Root Causes.

Page 9

ASM Key Learning from ProjectKey Learning from Project

• The explicit focus on operating practice failures identified opportunities to reduce risk to incidents that may not be identified via traditional investigation approaches

Page 10: Common Operations Failure Modes in the Process Industries …psc.tamu.edu/files/symposia/2009/presentations/2 Bullemer.pdf · Identify Common Failures. Identify Common Root Causes.

Page 10

ASM Key Learning from ProjectKey Learning from Project• BP Texas City incident (March 23,

2005) investigation reports (Baker, CSB, BP) failed to fully identify the following operating practice failures:

» Task-oriented collaborative communication (i.e., team coordination and real-time communication)

» Training for situation management and team collaboration (i.e., CRM- training)

» Need for a common console operator interface framework that supports all operator interaction requirements

Image from BP Incident Report (2006)

• Note: this investigation was not typical in level of detail and scope of coverage

Page 11: Common Operations Failure Modes in the Process Industries …psc.tamu.edu/files/symposia/2009/presentations/2 Bullemer.pdf · Identify Common Failures. Identify Common Root Causes.

Page 11

ASM Key Learning from ProjectKey Learning from Project• Typical analyses that focus on just root

causes are insufficient for identifying systemic improvement opportunities:

– Root causes explain ‘why’ something occurred, not ‘what’ occurred in terms of failures

– Root causes are general and not specific enough to drive continuous improvement – details are buried in incident report

– No effective methods for aggregating root cause details across incidents for systemic analysis of problems and improvements

IncidentEvent NEvent 2Event 1 Event N+1

RootCause

‘Why’ event occurred

Missing ‘What’ went wrong

RootCause

How aggregate details withinand across incidents?

Page 12: Common Operations Failure Modes in the Process Industries …psc.tamu.edu/files/symposia/2009/presentations/2 Bullemer.pdf · Identify Common Failures. Identify Common Root Causes.

Page 12

ASM Some DefinitionsSome Definitions

• Incident failure is any operational practice flaw that, if corrected, could have prevented the incident from occurring or would have significantly mitigated its consequences

– ‘What’ went wrong in the specific incidents and often in the investigation team’s own language/terms

– In the research project incident failures were identified based on incident reports

– Example: Supervisor did not check procedure progress

Page 13: Common Operations Failure Modes in the Process Industries …psc.tamu.edu/files/symposia/2009/presentations/2 Bullemer.pdf · Identify Common Failures. Identify Common Root Causes.

Page 13

ASM Some DefinitionsSome Definitions• Common failure are shared operational

practice failures across incidents– Common problems for the industry (or site)– In the research project common failures map to ASM

Effective Operations Practices Guidelines– Example: Ineffective first line leadership

Page 14: Common Operations Failure Modes in the Process Industries …psc.tamu.edu/files/symposia/2009/presentations/2 Bullemer.pdf · Identify Common Failures. Identify Common Root Causes.

Page 14

ASM Some DefinitionsSome Definitions• A root cause is the most basic cause (or

causes) that can reasonably be identified that management has control to fix and, when fixed, will prevent (or significantly reduce the likelihood of) the failure’s (or factor’s) recurrence

– ‘Why’ a failure occurred– In the research project root causes were based on

TapRoot®– An operations failure mode may have more than one

root cause– Example: No Supervision and No communication

may both result in Ineffective first line leadership failure mode

Page 15: Common Operations Failure Modes in the Process Industries …psc.tamu.edu/files/symposia/2009/presentations/2 Bullemer.pdf · Identify Common Failures. Identify Common Root Causes.

Page 15

ASM Some DefinitionsSome Definitions• Root cause manifestations are the specific

expression or indication of a root cause in an incident

– ‘How’ operational failure modes are expressed in real operations settings – are the root cause details aggregated across incidents

– Basis for creating audit checklist to proactively look for operational risks

– Example: Supervisor not in control room to discuss problems is an example manifestation for the No Supervision common root cause and the Ineffective First Line Leadership Role common failure mode

Page 16: Common Operations Failure Modes in the Process Industries …psc.tamu.edu/files/symposia/2009/presentations/2 Bullemer.pdf · Identify Common Failures. Identify Common Root Causes.

Page 16

ASM Relation of Failures to Root Causes to Manifestations Relation of Failures to Root Causes to Manifestations

Incident 1

Failure NFailure 1

Root Cause Root Cause Root Cause

Manifestation ManifestationManifestation

Failure 2

Root Cause

Manifestation

…Root Cause

Manifestation

Root Cause

Manifestation

Incident 2

Failure NFailure 1

Root Cause Root Cause

Manifestation Manifestation

Failure 2

Root Cause

Manifestation

…Root Cause

Manifestation

Root Cause

Manifestation

Root Cause

Manifestation

Page 17: Common Operations Failure Modes in the Process Industries …psc.tamu.edu/files/symposia/2009/presentations/2 Bullemer.pdf · Identify Common Failures. Identify Common Root Causes.

Page 17

ASM Relation of Failures to Root Causes to Manifestations Relation of Failures to Root Causes to Manifestations

CommonFailures

• Incident failures are often in the analysts own language so some kind of mapping must occur to determine common failures

– In the research project, the team mapped the incident failures to the ASM Effective Operations Practices Guidelines

Incident 1

Failure NFailure 1

Root Cause Root Cause

Manifestation Manifestation

Failure 2

Root Cause

Manifestation

…Root Cause

Manifestation

Incident 2

Failure NFailure 1

Root Cause Root Cause

ManifestationManifestation

Failure 2

Root Cause

Manifestation

…Root Cause

Manifestation

Page 18: Common Operations Failure Modes in the Process Industries …psc.tamu.edu/files/symposia/2009/presentations/2 Bullemer.pdf · Identify Common Failures. Identify Common Root Causes.

Page 18

ASM Relation of Failures to Root Causes to Manifestations Relation of Failures to Root Causes to Manifestations

Common Root Causes

CommonFailures

• In the research project, Common Root Causes were simply the count and relative frequency of the TapRoot ® root causes across the incident sample

Incident 1

Failure NFailure 1

Root Cause Root Cause

Manifestation Manifestation

Failure 2

Root Cause

Manifestation

…Root Cause

Manifestation

Incident 2

Failure NFailure 1

Root Cause Root Cause

ManifestationManifestation

Failure 2

Root Cause

Manifestation

…Root Cause

Manifestation

Page 19: Common Operations Failure Modes in the Process Industries …psc.tamu.edu/files/symposia/2009/presentations/2 Bullemer.pdf · Identify Common Failures. Identify Common Root Causes.

Page 19

ASM

Common Manifestations

Relation of Failures to Root Causes to Manifestations Relation of Failures to Root Causes to Manifestations

Common Root Causes

CommonFailures

Incident 1

Failure NFailure 1

Root Cause Root Cause

Manifestation Manifestation

Failure 2

Root Cause

Manifestation

…Root Cause

Manifestation

Incident 2

Failure NFailure 1

Root Cause Root Cause

ManifestationManifestation

Failure 2

Root Cause

Manifestation

…Root Cause

Manifestation

Page 20: Common Operations Failure Modes in the Process Industries …psc.tamu.edu/files/symposia/2009/presentations/2 Bullemer.pdf · Identify Common Failures. Identify Common Root Causes.

Page 20

ASM

Common Manifestations

Relation of Failures to Root Causes to Manifestations Relation of Failures to Root Causes to Manifestations

Common Root Causes

CommonFailures

What?

Why?

How?

General

Specific

What to focus on?

How to addressproblems?

• Data at all three levels is needed to:– Focus improvement on common and systemic

problems– Understand why problems occur and develop

improvement programs and corrective actions to address real root causes

Page 21: Common Operations Failure Modes in the Process Industries …psc.tamu.edu/files/symposia/2009/presentations/2 Bullemer.pdf · Identify Common Failures. Identify Common Root Causes.

Page 21

ASM Relation of Failures to Root Causes to Manifestations Relation of Failures to Root Causes to Manifestations

Incident Incident Failure Common Failure

Common Root Cause Manifestation Common

Manifestation

Texas City

Shift Supervisor did not ensure procedures were being followed

Effective first line leadership

No Supervision

Supervisor did not check procedure progress before leaving site

Checking procedure progress for area of responsibility

Texas City

It was not clear who was in charge when supervisor was gone

Effective first line leadership

No Communication

Supervisor did not communicate with personnel that he was leaving the site

Bi-directional communication of status between supervisors and operators

Accountability needs improvement

No policy that outlines responsibilities when supervisor leaves the site

Unclear policy for supervisor requirements and expectations

Esso Longford

No permit was issued or reviewed for the maintenance work

Effective first line leadership

Standards, Policies, Admin Controls (SPAC) not followed

Presence of field operator was assumed to remove need for permit

Enforcing practices/proce dures across the site

Page 22: Common Operations Failure Modes in the Process Industries …psc.tamu.edu/files/symposia/2009/presentations/2 Bullemer.pdf · Identify Common Failures. Identify Common Root Causes.

Page 22

ASMReview Incident Reports

Identify Common Failures

Identify Common Root Causes

Identify Common Manifestations

Analyze Gaps in Systems

Define Practice Improvements

Implement Practice Changes

Monitor Impact of Changes

List of operational failures, root causes

Custer list of failures per practice standardsList of top practice failure modes (covers at least 50%)

List top root causes for each failure mode

Cluster manifestations associated by root cause Consolidate list to highlight common elements

List of weaknesses in management systems and practice standards

List of prioritized solutions (cost, impact, etc)

Use leading/lagging metrics to track

ASM Effective Practice Work Process Description ASM Effective Practice Work Process Description

SiteIncident Reports

Site Practice Standards

Generate improvement action plan to make changes per priority & resource constraints

Site Continuous

Improvement Program

Page 23: Common Operations Failure Modes in the Process Industries …psc.tamu.edu/files/symposia/2009/presentations/2 Bullemer.pdf · Identify Common Failures. Identify Common Root Causes.

Page 23

ASM Impact of Typical ApproachImpact of Typical Approach

Review Incident Reports

Identify Common Failures

Identify Common Root Causes

Identify Common Manifestations

Analyze Gaps in Systems

Define Practice Improvements

Implement Practice Changes

Monitor Impact of Changes

SiteIncident Reports

Site Practice Standards

Site Continuous

Improvement Program

• Typical programs look at individual incidents for root causes

• Action plans developed to address root causes

• Operations practice failure modes are NOT explicitly identified

• Manifestations of root causes are NOT captured to help identify gaps in management systems and operations practices

• Continuous improvement programs lack input from incident based gap analysis

Page 24: Common Operations Failure Modes in the Process Industries …psc.tamu.edu/files/symposia/2009/presentations/2 Bullemer.pdf · Identify Common Failures. Identify Common Root Causes.

Page 24

ASM Value of Failure Mode InformationValue of Failure Mode InformationCommon Failures vs. Root CausesCommon Failures vs. Root Causes

Failure Modes # %

Hazard analysis/communication 79 15%

First line leadership 65 12%

Continuous improvement 60 11%

Safety culture 36 7%

Initial and refresher training 30 6%

Task communications 29 5%

Comprehensive MOC 28 5%

Cross functional communication 23 4%

Compliance with procedures 15 3%

Design guidelines and standards 14 3%

Other failure modes 160 30%

TOTAL 539

ASM Approach Typical ApproachRoot Causes # %

No communication 71 8%

Crew Teamwork Needs Improvement 58 7%

Hazard Analysis Needs Improvement 46 5%

Management of Change (MOC) Needs Improvement

40 5%

Displays Need Improvement 35 4%No supervision 34 4%

Corrective Action Needs Improvement 33 5%

No Standards, Policy or Administrative Controls (SPAC)

32 4%

SPAC confusing or incomplete 32 4%

SPAC not followed 29 3%

Others 160 51%

TOTAL 432

• In our analysis, Common Failure Modes correspond to specific ASM Effective Operations Practices

• Moreover, failure modes need to map to a site’s operations practice standards, policy and guidelines

Page 25: Common Operations Failure Modes in the Process Industries …psc.tamu.edu/files/symposia/2009/presentations/2 Bullemer.pdf · Identify Common Failures. Identify Common Root Causes.

Page 25

ASM

Root Cause Profile # %No supervision 14 18%Crew teamwork needs improvement

11 14%

SPAC[1] not followed 8 10%MOC needs improvement 6 8%Pre-job briefing needs improvement 5 6%Other 36 45%

Total 80 100%

[1] Standards, policies, administrative controls—standardized work processes, rules, procedures

8%6%

45%

18%

14%

10%

No supervision

Crew teamwork needsimprovement

SPAC not followed

Management of change(MOC) needs improvement

Pre-job briefing needsimprovement

Other

Improvement Opportunities for First Line Leadership Improvement Opportunities for First Line Leadership

• Improvement opportunities are identified by extracting the root cause profiles for each common failure mode

– Profiles show distribution of common root causes (i.e., “why the failure occurred”) across incidents

Page 26: Common Operations Failure Modes in the Process Industries …psc.tamu.edu/files/symposia/2009/presentations/2 Bullemer.pdf · Identify Common Failures. Identify Common Root Causes.

Page 26

ASM• Identify the root cause manifestations for each profile

– Specific reasons the failures occurred across incidents

– Manifestations are “indicators” of failures– Potential candidates for leading indicators of incidents

Root Cause (from profile) Rating

9939

313133

No supervision

Crew teamwork needs improvement Ensuring team members (eg ops, maint) stay coordinated

Not correcting/communicating known problems

Team not focusing on critical activities/indicators (tunnel vision)Supervisor not keeping track of big picture, losing sight of hazards

Team members not questioning when evidence of problems

Enforcing violations of practices/procedures (esp related to safety)

Manifestation

Checking procedure progress for area of responsibilityBeing at job site and maintaining situation awarenessIdentifying and addressing risk to personnelMonitoring high risk activities for problems/issues

Improvement Opportunity First Line Leadership Improvement Opportunity First Line Leadership

Page 27: Common Operations Failure Modes in the Process Industries …psc.tamu.edu/files/symposia/2009/presentations/2 Bullemer.pdf · Identify Common Failures. Identify Common Root Causes.

Page 27

ASM ConclusionsConclusions

• If analysis is limited to individual incident analysis, the tendency is to address root causes specific to the incident

• A single incident focus may miss the larger management system contributions to safety risk

• Hence, the improvement may not have the intended positive impact

Page 28: Common Operations Failure Modes in the Process Industries …psc.tamu.edu/files/symposia/2009/presentations/2 Bullemer.pdf · Identify Common Failures. Identify Common Root Causes.

Page 28

ASM ConclusionsConclusions

• Whereas, if the analysis is based on a sample of incidents (either common failures or root causes)

• Analysts will make assumptions about how to address high-level root causes such as “No supervision”

• Manifestations ground improvement opportunities in the incident data

– increasing the likelihood of understanding the operations practice or management system vulnerabilities

Page 29: Common Operations Failure Modes in the Process Industries …psc.tamu.edu/files/symposia/2009/presentations/2 Bullemer.pdf · Identify Common Failures. Identify Common Root Causes.

Page 29

ASM Discussion/QuestionsDiscussion/Questions

• Thank You!

• Questions and/or Comments?

Page 30: Common Operations Failure Modes in the Process Industries …psc.tamu.edu/files/symposia/2009/presentations/2 Bullemer.pdf · Identify Common Failures. Identify Common Root Causes.

Page 30

ASM AbstractAbstract

• The Abnormal Situation Management® Consortium funded a study to investigate common failure modes and root causes associated with operations practices. The study team analyzed 20 public and 12 private incident reports using the TapRoot® methodology to identify root causes. These root causes were mapped to operations practice failures. This presentation presents the top ten operations failure modes identified in the analysis. Specific recommendations include how to analyze plant incident reports to better understand the sources of systemic failures and improve plant operating practices.

• This research study was sponsored by the Abnormal Situation Management® (ASM®) Consortium. ASM and Abnormal Situation Management are registered trademarks of Honeywell International, Inc.