Safety in Interactive Systems Christopher Powell.

Safety in Interactive Systems

Christopher Powell

Safety in Interactive Systems

Christopher PowellChristopher Power

Safety?

• Usually in HCIT, we have talked about properties of the interactive system as being facets of usability

• However, there are other properties that we often need to consider – safety is one of them

• Informally, safety in interactive systems broadly means preventing incidents that can lead to catastrophic loss, either from the human, the machine or the organisation

Human error

• We often talk about human error – if you have read the QUAN primer on error, you know that the words Human Error are kind of meaningless

• Why is it meaningless though?– Because life is messy … there are lots of routes to

any given error– Observed phenomena may be caused by many

underlying factors, some related to the human, some to the device and some to the environment/context

Reason’s “Swiss Cheese” Model of Human Error

• John Reason proposed a model in 1990 for understanding where error occurs by categorizing them by the process where they occur.

• Proposed that errors could occur in one process be propagated forward in a system.

• Forms the basis for the Human Factors Analysis and Classification System (HFACS)

Swiss Cheese Model

• Each process is a layer of cheese, with holes where errors can slip through:

HFACS: Swiss Cheese Model

• In most cases safe-guards in other processes catch them and correct them.

HFACS: Swiss Cheese Model• However, sometimes the holes in the system line up, and an error makes it

all the way to the end with the effects of the error being realised.

Unsafe Acts

• Unsafe acts are those that are tied to the action cycle involving humans and system interaction.

Errors and violations

Unsafe Acts

• This looks kind of familiar:– Errors deal with the perception, evaluation, integration and

executing actions.– Violations deal with goals, intentions and action specifications.

HEA and HRA

• How is this classification useful to us?• In respect to error, there are two different

processes that we can undertake:– Human Error Analysis – trying to capture where

errors can happen in the system, either proactively (evaluation prior to incident) or retrospectively

– Human Reliability Analysis – trying to capture the probability that a human will have a fault in the system at some point

Human Error Analysis Techniques

• There are around 40 different techniques that I could point to in the literature

• Many of these have little empirical basis and have never been validated

• Some have had some work done, but it is debatable how well they work

• We’re going to look at different types of error analysis methods over the next couple of hours

Error Modes• Most modern techniques use

the idea of an “error mode”• Error modes are categories of

phenomena that we see when an incident occurs in the world

• These phenomena could have many causes – we can track back along the causal chain

• Alternatively we can use the phenomena we see (or suspect will happen) and compare it to interface components to see what might happen in the future

SHERPA

Background

• SHERPA stands for “The Systematic Human Error Reduction and Prediction Approach”

• Developed by Embrey in the mid-1980s for the nuclear reprocessing industry (but you cannot find the original reference!)

• Has more recently been applied with notable success to a number of other domains

• (Baber and Stanton 1996, Stanton 1998, Salmon et al. 2002, Harris et al. 2005…)

• Has its roots in Rasmussen’s “SRK” model (1982)…

SRK Reminder …

• Skill-based actions– Those that require very little conscious control e.g. driving a car on a

known route

• Rule-based actions– Those which deviate from “normal” but can be dealt with using rules

stored in memory or rules which are otherwise e.g. setting the timer on an oven

• Knowledge-based actions– The highest level of behaviour, applicable when the user has either run

out of rules to apply or did not have any applicable rules in the first place. At that time the user is required to use in-depth problem solving skills and knowledge of the mechanics of the system to proceed e.g. pilot response during QF32 incident

SHERPA Taxonomy

• SHERPA, like many HEI techniques has it’s own cut-down taxonomy, here drawn up taking cues from SRK

• Uses prompts are firmly based on operator behaviours as opposed to listing conceivable errors in taxonomic approaches

• The taxonomy was “domain specific” (nuclear), but SHERPA has still been shown to work well across other domains (see references)

• Rather than have the evaluator consider what psychological level the error has occurred at, the taxonomy simplifies this into the most likely manifestations (modes) for errors to occur

SHERPA Taxonomy

• The headings for SHERPA’s modes are (expanded next):

– Action (doing something like pressing a button)– Retrieval (getting information from a screen or

instruction list)– Checking (verifying action)– Selection (choosing one of a number of options)– Information Communication (conversation/radio

call etc.)

Taxonomy - Action

• Action modes:– A1: Operation too long/short– A2: Operation mistimed– A3: Operation in wrong direction– A4: Operation too little/much– A5: Misalign– A6: Right operation on wrong object– A7: Wrong operation on right object– A8: Operation omitted– A9: Operation incomplete– A10: Wrong operation on wrong object

Taxonomy - Retrieval & Checking

• Retrieval modes are:– R1: Information not obtained– R2: Wrong information obtained– R3: Information retrieval incomplete

• Checking modes are:– C1: Check omitted– C2: Check incomplete– C3: Right check on wrong object– C4: Wrong check on right object– C5: Check mistimed– C6: Wrong check on wrong object

Taxonomy - Selection & Comms.

• Selection modes are:– S1: Selection omitted– S2: Wrong selection made

• Information Communication modes are:– I1: Information not communicated– I2: Wrong information communicated– I3: Information communication incomplete

SHERPA Methodology

• SHERPA begins like many HEI methods, with a Hierarchical Task Analysis (HTA)

• Then the ‘credible’ error modes are applied to each of the bottom-level tasks in the HTA

• The analyst categorises each task into a behaviour, and then determines if any of the error modes provided are credible

• Each credible error is then considered in terms of consequence, error recovery, probability and criticality

Step 1 - HTA

• Using the example of a Sat-nav

Step 2 - Task Classification

• Each task at the bottom level of the HTA is classified into a category from the taxonomy

Action

Action Selection Retrieval

Step 3 – Error Identification

• For the category selected for a given task, the credible error modes are selected and a description of the error provided

Selection:“Wrong selection made” – The user makes the wrong selection, clicking “point of interest” or something similar

Retrieval“Wrong information obtained” – The user reads the wrong postcode and inputs it

Step 4 – Consequence Analysis

• For each error, the analyst considers the consequences

The user makes the wrong selection, clicking “point of interest” or something similar…This would lead to the wrong menu being displayed which may confuse the user

The user reads the wrong postcode and inputs it…

Depending on the validity of the entry made the user may plot a course to the wrong destination

Step 5 – Recovery Analysis

• For each error, the analyst considers the potential for recovery

The user makes the wrong selection…There is good recovery potential from this error as the desired option will not be available and back buttons are provided. This may take a few menus before the correct one is selected though

The user reads the wrong postcode and inputs it…The recovery potential from this is fair, from the perspective that the sat Nav shows the duration and overview of the route, so depending on how far wrong the postcode is, it may be noticed at that point

Steps 6, 7 – Probability & Criticality• Step 6 is an ordinal probability analysis, where

L/M/H is assigned to the error based on previous occurrence– This requires experience and/or subject matter

expertise

• Step 7 is a criticality analysis, which is done in a binary fashion binary (it is either critical or it is not critical)

Step 8 – Remedy• Step 8 is a remedy analysis, where error

reduction strategies are proposed under the headings; – Equipment, Training, Procedures, Organisational

EquipmentThe use of the term ‘address’ may confuse some people when intending to input a postcode…as postcode is a common entry, perhaps it should not be beneath ‘address’ in the menu system

ProcedureThe user should check the destination/postcode entered for validity. The device design could display the destination more clearly than it does to offer confirmation to the user

Output

• Output of a full SHERPA analysis (Stanton et al 2005)

Output

Summary

• SHERPA is an alternative to HE HAZOP• Claims in the literature point to it being “more

easy to learn” and “more easy to apply by novices” – which is attractive

• Founded on some of the roots of HF work done in the 1970s but …

• … simplifies that work into something that can be applied

References• Rasmussen, J. (1982) Human errors, a taxonomy for describing human

malfunction in industrial installations. The Journal of Occupational Accidents, 4, 22.

• Baber, C. & N. A. Stanton (1996) Human error identification techniques applied to public technology: Predictions compared with observed use. Applied Ergonomics, 27, 119-131.

• Stanton, N. 1998. Human Factors in Consumer Products. CRC Press.• Salmon, P., N. Stanton, M. Young, D. Harris, J. Demagalski, A. Marshall, T.

Waldman & S. Dekker. 2002. Using Existing HEI Techniques to Predict Pilot Error: A Comparison of SHERPA, HAZOP and HEIST. HCI-02 Proceedings.

• Harris, D., N. A. Stanton, A. Marshall, M. S. Young, J. Demagalski & P. Salmon (2005) Using SHERPA to predict design-induced error on the flight deck. Aerospace Science and Technology, 9, 525-532.

• Stanton, N., P. Salmon, G. Walker, C. Baber & D. Jenkins. 2005. Human Factors Methods. Ashgate.

• Action modes:– A1: Operation too long/short– A2: Operation mistimed– A3: Operation in wrong direction– A4: Operation too little/much– A5: Misalign– A6: Right operation on wrong

object– A7: Wrong operation on right

object– A8: Operation omitted– A9: Operation incomplete– A10: Wrong operation on wrong

object• Selection modes are:

– S1: Selection omitted– S2: Wrong selection made

• Retrieval modes are:– R1: Information not obtained– R2: Wrong information obtained– R3: Information retrieval

incomplete

• Checking modes are:– C1: Check omitted– C2: Check incomplete– C3: Right check on wrong object– C4: Wrong check on right object– C5: Check mistimed– C6: Wrong check on wrong object

• Information Communication modes are:– I1: Information not communicated– I2: Wrong information

communicated– I3: Information communication

incomplete

Human Error Analysis

• The qualitative nature of the techniques in HEA allow the participants to explore the cause of the error as opposed to the effects of the error.

• This differs from quantification in that it is not about when or if an error will happen but instead about how and why it will happen.

• Most techniques involve asking detailed questions about where errors could occur in a design.

• We have just seen two examples, with SHERPA and HEHAZOP – but there are problems …

Behavioural guide words

• ‘Traditional’ HRA guidewords for error analysis: (Swain & Guttman,1983)

Lecture 8/Slide 37

Errors of Omission Omit actions / sub-goals

Commission Substitute actions / sub-goals

Carry out action incorrectly

Insert extraneous action

Errors of Sequence Actions in wrong order

Repetition Actions repeated unnecessarily

Qualitative error Too much / too little

Time error Too early / too late / too long

Examples of HEHAZOP Guidewords

• Omission: operator fails to close the valve. • Commission: operator turns the valve clockwise

thereby opening it wider rather than closing it.

• Commission (extraneous): instead of closing the isolation valve, operator switches off the pump because pump on-off switch is close to isolation valve(“doing the wrong thing”)

Lecture 8/Slide 38

Some problems of definitionTime interval when action was required

Missing

Delayed

Premature

Replaced commission

Four variations of omission

(Hollnagel, 1998)

t

Some problems of definition

• Task: Entering an altitude value into the altitude alert window in an aircraft cockpit:

• “Substitution error” could be– Doing something other than entering data– Entering data into a different device– Entering a distance value instead of the altitude

• “Commission error” is not very constraining as a guide due to the large number of substitutions possible

• What is needed is more cognitive analysis for attributing error causes

Lecture 8/Slide 40

THEA: Technique for Human Error Analysis

• It is not always the case that a product needs to be quantified for safety at all times of development.

• Early designs and prototypes can be examined early in the iterative design cycle to determine if there are major failures.

• As a result, qualitative analyses can be completed by people with comparatively low training to quantification techniques.

• One example of this is THEA (Fields, Harrison, Wright, 2001).

THEA: Scenario Template

• In each scenario, the evaluator completes the following headings:

• Agents: human agents involved in the interaction with the system.• Rationale: the reasons the scenario is being examined.• Situation and environment: a description of the setting, and the

environmental triggers and events that occur during the scenario.• Task context: what tasks are performed (high level), what

procedures are being used, are the procedures violated at any time?

THEA: Scenario Template

• In each scenario, the evaluator completes the following headings:

• System Context: what devices are involved, what known usability problems are there and what effects can users have on the system that affects the flow of the scenario?

• Action: how are the tasks carried out? How do they relate to the overall goals?

• Exceptional Circumstances: how might things evolve differently if known exceptions occur?

• Assumptions: are there any implicit conditions or activities going on in the environment that should be detailed?

THEA: Scenario Example Scenario 2 – In Flight Refuelling (IFR) Agents A pilot engaged in in-flight refuelling (IFR) activities; Tanker crew; Eurofighter MHDD Rationale The scenario involves the pilot in considerable fault finding analysis as well as some difficult decision making arising from two fuel system abnormalities. One of these failures may be regarded as latent, as neither pilot nor system can detect the failed shut refuel valve since this is its default position Situation and environment The scenario takes place at a designated refuelling altitude over ocean and in fine visual weather conditions. Task Context The pilot is required to draw on extensive task and system knowledge, as well as experience, in order to take appropriate actions at the appropriate time System Context The fuselage forward group (FRG) experiences a refuel valve (RSOV) failure shortly after take off. This cannot be detected by either aircraft or fuel management computers and thus exhibits no external fault manifestation. The first indication of a problem will be after refuelling has commenced when the MHDD will show that the FRG is not filling. Just as this becomes apparent, a left-hand hydraulic system failure occurs Action The pilot must diagnose how and why the FRG does not appear to be filling with fuel. At the same time, the hydraulic failure complicates fault finding and presents the pilot with difficult decisions and task prioritisation issues Exceptional Circumstances This scenario is constructed around the production Eurofighter aircraft since the problems encountered are anticipated as being harder to correct than in the development aircraft. In the latter aircraft, current procedure simply requires the pilot to terminate IFR and land asap. This scenario would also involve different decisions to be made if the aircraft was refuelling over densely populated areas such as those encountered in Europe Assumptions 1. There are no complications other than those presented in the scenario

THEA: Creation of HTA• The task information in the scenario creates a basis for an HTA.• For each task in the hierarchy in which questions are asked about

the performance of humans in four different stages (that look very familiar!):– Goals– Plans– Perception/Interpretation/Evaluation– Action

• For each error detected the evaluators can record consequences of the error and possible error reduction measures such as changes to design.

THEA: HTA Example

• However, in a more advanced task model, it may be necessary to treat each subtask as a goal itself. Consider the following:

THEA: HTA Example

• The evaluation begins with the level above the lowest level tasks that have been modelled.

THEA: HTA Example

• When analysis of this subgoal is complete, it can be considered a task in the higher level plan.

THEA: Questions divided by cognitive model stage

THEA Summary

• For each goal and its related plans and tasks an evaluator must answer a set of questions determine possible errors. These can be used to inform further designs.

• The goal of this method is not to quantify error, but to help identify possible error conditions in a scenario early in prototype design.

• For a large task model, this method becomes very time consuming.

• This method does not catch collaborative errors.

Conclusions

• Qualitative approaches are useful for examining prototypes and early designs to identify potential trouble spots where errors could happen.

• It has a different purpose than quantification. Quantification examines the probability that something bad will happen, whereas the above approaches discuss how and why errors could occur.

Safety in Interactive Systems Christopher Powell.

Documents

Transcript of Safety in Interactive Systems Christopher Powell.