A usability evaluation toolkit for In-Vehicle Information Systems (IVISs)

12
A usability evaluation toolkit for In-Vehicle Information Systems (IVISs) Catherine Harvey a, * , Neville A. Stanton a , Carl A. Pickering b , Mike McDonald a , Pengjun Zheng a a Transportation Research Group, School of Civil Engineering and the Environment, University of Southampton, Higheld Campus, University Road, Southampton, Hampshire SO17 1BJ, UK b Jaguar and Land Rover Technical Research, Jaguar Cars, Engineering Centre, Whitley, Coventry, CV3 4LF, UK article info Article history: Received 19 August 2009 Accepted 22 September 2010 Keywords: Usability Evaluation In-Vehicle Information Systems abstract Usability must be dened specically for the context of use of the particular system under investigation. This specic context of use should also be used to guide the denition of specic usability criteria and the selection of appropriate evaluation methods. There are four principles which can guide the selection of evaluation methods, relating to the information required in the evaluation, the stage at which to apply methods, the resources required and the people involved in the evaluation. This paper presents a framework for the evaluation of usability in the context of In-Vehicle Information Systems (IVISs). This framework guides designers through dening usability criteria for an evaluation, selecting appropriate evaluation methods and applying those methods. These stages form an iterative process of designeevaluationeredesign with the overall aim of improving the usability of IVISs and enhancing the driving experience, without compromising the safety of the driver. Ó 2010 Elsevier Ltd and The Ergonomics Society. All rights reserved. 1. An overview of usability evaluation One of the most popular denitions of usability was provided by the International Organisation for Standardisation (1998): [The] extent to which a product can be used by specied users to achieve specied goals with effectiveness, efciency and satis- faction in a specied context of use. (1998, p. 2) There have been many more useful denitions (in particular see Bevan, 2001; Nielsen, 1993; Norman, 1983; Shackel, 1986; Shneiderman, 1992); however, evidence has shown that there is unlikely ever to be a single universally accepted denition of usability (Harvey et al., in press-a). This is because consideration of the context of use is essential in dening usability criteria and this will be different for each system under investigation. One of the main purposes of dening criteria for usability is so that it can be evaluated. Usability evaluation is used to assess the extent to which a systems humanemachine interface (HMI) complies with the various usability criteria which are applicable in its specic context of use. The results of a usability evaluation can be used to indicate the likely success of a product with its intended market, to compare two or more similar products, to provide feedback to inform design, and even to estimate possible training requirements associated with the product (Butler, 1996; Rennie, 1981). 2. Preparing for a usability evaluation The main aim of this work is to develop a usability evaluation framework for In-Vehicle Information Systems (IVISs). IVISs are typically menu-based systems which enable most secondary vehicle functions to be integrated into one system and accessed via a single screen-based interface. Secondary functions relate to the control of communication, comfort, infotainment and navigation; primary functions on the other hand are those involved in main- taining safe control of the vehicle, i.e. driving (Lansdown, 2000). Before developing the IVIS usability evaluation framework, a number of features relating to this specic system had to be dened. These related to the interactions which occur between the tasks, users and system and the context of use of IVISs. It was also essential to dene a comprehensive list of criteria for the usability of IVISs, in order to provide some targets for the evaluation. Based on the authorsexperience of developing this evaluation frame- work, it is recommended that prior to conducting any usability evaluation, evaluators follow three principles to ensure that important preliminary information is carefully dened: these are presented in Table 1 . The application of each principle to IVISs is described in the following sections. 2.1. Dening the taskeuseresystem interaction for IVISs The usability of an IVIS is affected by the HMI, which determines how well a driver can input information, receive and understand outputs and monitor the state of the systems (Cellario, 2001; Daimon * Corresponding author. Tel.: þ44 2380593713. E-mail address: [email protected] (C. Harvey). Contents lists available at ScienceDirect Applied Ergonomics journal homepage: www.elsevier.com/locate/apergo 0003-6870/$ e see front matter Ó 2010 Elsevier Ltd and The Ergonomics Society. All rights reserved. doi:10.1016/j.apergo.2010.09.013 Applied Ergonomics 42 (2011) 563e574

Transcript of A usability evaluation toolkit for In-Vehicle Information Systems (IVISs)

Page 1: A usability evaluation toolkit for In-Vehicle Information Systems (IVISs)

lable at ScienceDirect

Applied Ergonomics 42 (2011) 563e574

Contents lists avai

Applied Ergonomics

journal homepage: www.elsevier .com/locate/apergo

A usability evaluation toolkit for In-Vehicle Information Systems (IVISs)

Catherine Harvey a,*, Neville A. Stanton a, Carl A. Pickering b, Mike McDonald a, Pengjun Zheng a

a Transportation Research Group, School of Civil Engineering and the Environment, University of Southampton, Highfield Campus, University Road, Southampton,Hampshire SO17 1BJ, UKb Jaguar and Land Rover Technical Research, Jaguar Cars, Engineering Centre, Whitley, Coventry, CV3 4LF, UK

a r t i c l e i n f o

Article history:Received 19 August 2009Accepted 22 September 2010

Keywords:UsabilityEvaluationIn-Vehicle Information Systems

* Corresponding author. Tel.: þ44 2380593713.E-mail address: [email protected] (C. Harvey).

0003-6870/$ e see front matter � 2010 Elsevier Ltddoi:10.1016/j.apergo.2010.09.013

a b s t r a c t

Usability must be defined specifically for the context of use of the particular system under investigation.This specific context of use should also be used to guide the definition of specific usability criteria and theselection of appropriate evaluation methods. There are four principles which can guide the selection ofevaluation methods, relating to the information required in the evaluation, the stage at which to applymethods, the resources required and the people involved in the evaluation. This paper presents a frameworkfor the evaluation of usability in the contextof In-Vehicle Information Systems (IVISs). This framework guidesdesigners through defining usability criteria for an evaluation, selecting appropriate evaluationmethods andapplying those methods. These stages form an iterative process of designeevaluationeredesign with theoverall aim of improving the usability of IVISs and enhancing the driving experience, without compromisingthe safety of the driver.

� 2010 Elsevier Ltd and The Ergonomics Society. All rights reserved.

1. An overview of usability evaluation

One of the most popular definitions of usability was provided bythe International Organisation for Standardisation (1998):

[The] extent towhich a product can be used by specified users toachieve specified goals with effectiveness, efficiency and satis-faction in a specified context of use. (1998, p. 2)

There have been many more useful definitions (in particular seeBevan, 2001; Nielsen, 1993; Norman, 1983; Shackel, 1986;Shneiderman, 1992); however, evidence has shown that there isunlikelyever tobea singleuniversallyaccepteddefinitionof usability(Harveyet al., inpress-a). This is because considerationof the contextof use is essential in defining usability criteria and this will bedifferent for each system under investigation. One of the mainpurposes ofdefiningcriteria for usability is so that it canbeevaluated.Usability evaluation is used to assess the extent to which a system’shumanemachine interface (HMI) complieswith thevarious usabilitycriteriawhich are applicable in its specific context of use. The resultsof a usability evaluation can be used to indicate the likely success ofa product with its intended market, to compare two or more similarproducts, to provide feedback to inform design, and even to estimatepossible training requirements associated with the product (Butler,1996; Rennie, 1981).

and The Ergonomics Society. All ri

2. Preparing for a usability evaluation

The main aim of this work is to develop a usability evaluationframework for In-Vehicle Information Systems (IVISs). IVISs aretypically menu-based systems which enable most secondaryvehicle functions to be integrated into one system and accessed viaa single screen-based interface. Secondary functions relate to thecontrol of communication, comfort, infotainment and navigation;primary functions on the other hand are those involved in main-taining safe control of the vehicle, i.e. driving (Lansdown, 2000).Before developing the IVIS usability evaluation framework,a number of features relating to this specific system had to bedefined. These related to the interactions which occur between thetasks, users and system and the context of use of IVISs. It was alsoessential to define a comprehensive list of criteria for the usabilityof IVISs, in order to provide some targets for the evaluation. Basedon the authors’ experience of developing this evaluation frame-work, it is recommended that prior to conducting any usabilityevaluation, evaluators follow three principles to ensure thatimportant preliminary information is carefully defined: these arepresented in Table 1. The application of each principle to IVISs isdescribed in the following sections.

2.1. Defining the taskeuseresystem interaction for IVISs

The usability of an IVIS is affected by the HMI, which determineshow well a driver can input information, receive and understandoutputs andmonitor the state of the systems (Cellario, 2001; Daimon

ghts reserved.

Page 2: A usability evaluation toolkit for In-Vehicle Information Systems (IVISs)

Table 1Three general principles for preparing an evaluation.

Define the taskeuseresystem interaction

These three factors determine the usability of asystem and the way in which they will berepresented in the evaluation needs to bedetermined. Unlike the task and the system, thedesigner has no control over the user of thesystem; however the needs of the user and theirconceptual model of the interaction must beconsidered in design (Landauer, 1997; Norman,2002; Preece et al., 2002; Walker et al., 2001).

Define the context of use The usability of a system is dependent on thecontext within which it is used. This is becausecertain attributes of usability will be more or lessimportant depending on the circumstances inwhich a system is used (Chamorro-Kocet al., 2008; Greenberg and Buxton, 2008).All factors which influence this context of useneed to be identified.

Define usability criteria Before a system can be evaluated, evaluatorsneed to know which aspects of the interactionare relevant to usability. Usability criteria, whichdefine a target level of usability, need to bedeveloped (Harvey et al., in press-a).

C. Harvey et al. / Applied Ergonomics 42 (2011) 563e574564

and Kawashima, 1996; Stanton and Salmon, 2009). The driver caninput information to the IVIS via twomodes: physical,which formostIVISs involvesmovements such as pushing buttons and turning dials;and verbal, which can involve the user speaking commands whichthe system interprets and responds to. IVIS outputs are generallymade through three modes: visual, auditory and physical, of whichthe first is most widely used. As well as sending and receivinginformation to and from the IVIS, the driver must also process thisinformation via the cognitivemode. The success of these interactionswill be influenced by the structure of tasks and the design of thesystem interface, which the designer is able to control. The interac-tion will also be affected by the characteristics of users. It is notpossible for designers to control these characteristics; however, toensure a high level of usability these characteristics must beaccounted for in design. This is a difficult skill as there is a tendencyfor people, including designers, to believe that they are aware of thedeterminants of their own behaviour and satisfaction and that theirown needs and perceptions of a particular system are equally appli-cable to everyone else. This is described as the ‘egocentric intuitionfallacy’ (Landauer,1997). Norman (2002) recommended that in orderto avoidmaking thesemistakes, designersmust be able to instil in theuser the appropriate conceptual model of an HMI through gooddesign. Evaluation with users is probably the most effective way toensure this user-centred design because their performance andattitudes will highlight the variability which designers find almostimpossible to predict.

2.2. Defining the context of use for IVISs

A thematic analysis was conducted in the context of IVISs toidentify six main factors which influence usability (Harvey et al., inpress-a):

� Dual task environment� Range of users� Environmental conditions� Training provision� Frequency of use� Uptake

The context of use within which the usability of an IVIS must bedefined is perhaps more important than many other products

because it is closely linked to additional, safetyecritical interactionsand the impact on these must be carefully considered. Fastrez andHaué (2008) suggested that the high diversity of the drivingcontext also increases the complexity of designing for usability,compared with other products and systems. With respect to thiscontext of use, the IVIS should be usable by the driver within thedual task driving environment. This means that the secondary tasksperformed via a usable IVIS should not interfere with the concur-rent driving task. An IVIS should be usable by the entire populationof potential users, which in a driving environment, comprises ofa diverse range of user characteristics. The wider driving environ-ment, including road, weather and in-vehicle conditions, must alsobe considered as an influence in this context. The design of an IVISshould account for limits in training provision and for varyingfrequencies of use. It should also ensure that there is successfuluptake of the system by users.

2.3. Defining usability criteria for IVISs

Thirteen criteria specific to IVISs were defined in relation to thesix context of use factors described in the previous section (Harveyet al., in press-a). Criteria from general definitions of usability, suchas efficiency, effectiveness and satisfaction (Bevan, 2001;International Organization for Standardization, 1998; Nielsen,1993; Norman, 1983; Shackel, 1986; Shneiderman, 1992), wereadapted to suit the specific context of use for IVISs. Selection wasalso guided by the relevance of criteria to driver needs, which weredescribed by Harvey et al. (in press-b) as safety, efficiency andenjoyment. The six context of use factors and thirteen IVIS usabilitycriteria are presented in Fig. 1. These criteria collectively defineusability for IVISs and each is measurable, either objectively orsubjectively. This means that the usability of these systems can becomprehensively evaluated, i.e. all attributes of usability which aresignificant in the context of human interaction with IVISs will becovered in an evaluation guided by these criteria.

3. Selecting usability evaluation methods

The success of usability evaluation depends on the appropri-ateness of the selection of evaluation methods (Annett, 2002;Kantowitz, 1992). The selection of usability evaluation methodswill be a matter of judgement on the part of the evaluator (Annett,2002) and it is therefore important that he/she has as muchinformation as possible to inform this choice and to ensure that theevaluation is not weakened by the use of inappropriate methods(Hornbæk, 2006; Kwahk and Han, 2002). Four principles to guidethe method selection process were defined following a review ofthe literature on usability evaluation, in which many authorsadvised that consideration of the type of information required, thestage of evaluation, the resources required and people involved isessential in the selection of appropriate methods (see for example,Butters and Dixon, 1998; Johnson et al., 1989; Kwahk and Han,2002; Stanton and Young, 1999b). These four principles, pre-sented and defined in Table 2, are closely interrelated and trade-offswill need to be carefully considered in order to identify appropriatemethods in accordance with this guidance.

3.1. Information requirements for IVIS usability evaluations

The information required from an evaluation of IVIS usabilitywas defined in the thirteen usability criteria presented in Section 2.Methods were assessed according to their abilities to produce thisinformation. Methods were distinguished based on the type of datathey deal with; specifically, whether this data is objective orsubjective. According to the usability criteria defined for IVISs,

Page 3: A usability evaluation toolkit for In-Vehicle Information Systems (IVISs)

Table 2Four general principles to guide the selection of usability evaluation methods.

Consider thetype of information

The type of data produced by evaluationmethods will influence the type of analysiswhich can be performed. Interaction times,error rates, user workload and satisfactionare just some of the measures which maybe useful in an evaluation and methodsshould be selected accordingly. A mix ofobjective and subjective methods is mostlikely to produce a balanced assessmentof usability.

Consider when to test Evaluation should take place throughoutthe design process, following an iterativecycle of designeevaluateeredesign (Gouldand Lewis, 1985; Hewett, 1986;Kontogiannis and Embrey, 1997;Liu et al., 2003). Methods should be selectedaccording to their suitability at differentstages of design. Methods applied at anappropriate time in the design processshould be capable of identifying usabilityissues before they become too costly torectify, but without suppressing thedevelopment of new ideas (Au et al., 2008;Greenberg and Buxton, 2008; Stanton andYoung, 2003).

Consider the resources The time and resource requirements of a methodneed to be balanced with the time and resourcesavailable for the evaluation. Resources include thesite of the evaluation, the data collection equipmentand the associated costs. Evaluations will also beconstrained by the time available and applicationtimes should be estimated in order to aidmethod selection.

Consider the people The people required for the application of a methodwill determine its suitability, given the personnelavailable for the evaluation. Expert evaluators usemethods to make predictions about the usabilityof a system, based on their knowledge andexperience (Rennie, 1981). Evaluating with usersproduces measures of the taskeuseresysteminteraction and is also useful for investigatingsubjective aspects of usability (Au et al., 2008;Sweeney et al., 1993). A mix of expert anduser tests is recommended to achieve acomprehensive evaluation of usability.

Fig. 1. IVIS context of use factors and usability criteria (from Harvey et al., in press-a).

C. Harvey et al. / Applied Ergonomics 42 (2011) 563e574 565

a mixture of objective and subjective methods was needed toreflect actual performance levels (e.g. effectiveness, efficiency andinterference) as well as the users’ opinions of the IVISs underinvestigation (e.g. satisfaction and perceived usefulness).

3.1.1. Objective measuresObjective measures are used to directly evaluate objects and

events, whereas subjective measures assess people’s perceptions ofand attitudes towards these objects and events (Annett, 2002). Inan evaluation of usability the objective measures of interest relateto the actual or predicted performance of the system and userduring the taskeuseresystem interaction. Objective measures ofsecondaryeprimary task interference, such as lateral/longitudinalcontrol and visual behaviour, are determined by the driver’sworkload, which is likely to be increased during interactions withthe IVIS, resulting in decrements in driving control. Objectivemeasures can be used to measure secondary task performance,using data on secondary task interaction times and errors. There arealso a number of methods which can predict objective performancedata by modelling the taskeuseresystem interaction using paper-based and computer-based simulations.

3.1.2. Subjective measuresSubjective measures, which involve the assessment of people’s

attitudes and opinions towards a system, primarily yield qualitativedata. Some methods use expert evaluators to identify potentialerrors, highlight usability issues, and suggest design improvements.

The results of these evaluation methods will be determined to someextent by the opinions and prior knowledge of the evaluatorsinvolved and may therefore differ between evaluators. The same istrue of some subjective, user-based methods, which obtain data onthe opinions of a representative sample of users.

3.2. When to apply methods in IVIS usability evaluations

An IVIS will begin as an idea in a designer’s mind and mayeventuallyevolve, through various prototype stages, into a completesystem. Usability evaluation methods must be appropriate to thestage in the system development process at which they are applied.An iterative process has been suggested for the evaluation of IVISusability (see the second principle, ‘consider when to test’, inTable 2). This consists of a cycle of designeevaluateeredesign,which is repeated until the usability criteria are satisfied. In aniterative process, usability evaluationmethods should be capable ofidentifying usability problems at different stages in the process andallowing these problems to be fixed before alterations to designbecome too costly and time consuming. Methods can be repeated atdifferent stages of the development process to produce new infor-mation with which to further refine the design of a system(McClelland, 1991).

Page 4: A usability evaluation toolkit for In-Vehicle Information Systems (IVISs)

C. Harvey et al. / Applied Ergonomics 42 (2011) 563e574566

3.2.1. Desktop methodsDesktop methods are used to predict system usability via paper-

based and computer-based simulations. They are applicable at anystage of design, providing evaluators have access to a fairly detailedspecification of the interaction style and the structure of tasks. It isuseful to apply desktop methods as early as possible in the designprocess so that the predictions they make can inform improve-ments to the design of IVISs before time and money is spentdeveloping prototype systems.

3.2.2. Experimental methodsExperimental methods are used to collect data on user perfor-

mance and workload, under simulated or real world conditions.They require amuchhigher level of resources thandesktopmethodsand are not usually applied until later in the design process, wheninitial design problems have been rectified and a prototype systemhas been developed.

3.3. Resources available for IVIS usability evaluations

In order to evaluate the usability of an interface, the task, userand system need to be represented. The way inwhich the task, userand system are represented will be affected by the resourcesavailable in an evaluation.

3.3.1. Representing the system and tasksThe tasks evaluated in any study will be determined by the

functionality of the prototype systemwhich is being tested and thisshould represent the full range of product attributes of interest inthe evaluation (McClelland, 1991). An IVIS can be represented usingpaper-based design drawings, system specifications, prototypes orcomplete systems (McClelland,1991). The level of prototype fidelitycan vary dramatically depending on the development stage,product type and features of the product under investigation(McClelland, 1991) and this will affect the validity of the results ofthe evaluation (Sauer and Sonderegger, 2009; Sauer et al., 2010).The costs associated with product development increase with thelevel of prototype fidelity so methods which can be used with lowspecification prototypes, and paper-based or computer-basedrepresentations will be more cost effective (Sauer et al., 2010).

3.3.2. Representing the userThe user can be represented using data generated from previous

tests or estimated by an expert evaluator, as is the case with desktopmethods. The user can also be represented by a sample of users whotakepart in experimental trials. This sample shouldbe representativeof the actual user population for the system under investigation.Experimental methods are generally more time consuming thandesktopmethodsbecause theactual interactionwith an IVISneeds tobe performed or simulated in real time, with real users. This usuallyneeds to be repeated under different testing conditions. Recruitmentof participants and data analysis also imposes high time demands soit may be suitable to use experimental methods to evaluate onlya small number of well developed systems. In contrast, the relativelow cost and time demands of desktop methods makes them moresuited to evaluating a larger number of lesswell developed concepts.

3.3.3. The testing environmentFor experimental methods, the testing environment is also an

important factor. Studies of driving performance and behaviour canbe conducted in the laboratoryoron real roads. In a laboratory-basedIVIS usability study, the driving environment is simulated. Drivingsimulators vary significantly in sophistication from single screen,PC-based systems, to moving base, full-vehicle mock-ups (Santoset al., 2005). Simulator studies are valuable for testing users in

conditionswhichmay be not be safe or ethical in a real environment(Stanton et al., 1997). They can also collect a high volume of data ina relatively short time because driving scenarios can be activated ondemand, rather than having towait for certain conditions to occur inthe real driving environment (Stanton et al.,1997). Real road studiesuse instrumented vehicles, equipped with various cameras andsensors, to record driving performance and behaviour. These can beconducted on a test track or on public roads. Real road studies aregenerally considered to provide themost realistic testing conditionsand valid results, however safety and ethical issues often limit thescopeof usability evaluation in these conditions (Santos et al., 2005).In experimental usability evaluations the IVIS also needs to besimulated. The level of system prototype fidelity will be influencedby time and cost constraints and these limitationsmust be traded offagainst the validity of results.

3.4. People involved in IVIS usability evaluations

As with most systems, the evaluation of IVISs will benefit fromtesting with both experts and potential users. An evaluationframework which included only expert-based methods or onlyuser-based methods could encourage designers to neglect evalua-tion if the relevant personnel were not readily accessible. Theevaluation framework instead allows potential evaluators to selectappropriate methods from awide selection according to the peopleavailable in the circumstances.

3.4.1. Usability evaluation with usersInvolving users in usability evaluation is important for assessing

the taskeuseresystem interaction, in particular for identifying thesymptoms of usability problems, from which the cause must beidentified and rectified (Doubleday et al., 1997). For a user trial,a sample which reflects the characteristics and needs of users, andalso the variation in these characteristics and needs, is required(McClelland, 1991; Sauer et al., 2010). The population of potentialIVIS users is very large and will include a diverse range of physical,intellectual and perceptual characteristics (Harvey et al., in press-b).Indriving, twoof themost important user characteristics are ageandexperience, with older and novice drivers considered particularlyvulnerable to the problems associated with balancing the informa-tion requirements of primary and secondary tasks (Amditis et al.,2010). These characteristics must be represented in a valid evalua-tion of IVIS usability. User trials are generally costly and timeconsuming, and it can often be difficult to recruit representativesamples of adequate size. It is likely that automotive manufacturerswill not always have the resources to run extensive user trials andtherefore a supplementary type of evaluation is needed.

3.4.2. Usability evaluation with expertsDesktop methods are applied by expert evaluators who aim to

identify the causes of usability problems by analysing the structureof tasks and the system interface (Doubledayet al.,1997). This allowspredictions about the likely usability issues to be made. Evaluatorsrequire a certain level of expertise to apply these desktop methodsbut this can normally be gained in a fewhours of familiarisation andpractice. The low costs associated with expert evaluations is one ofthemain advantages of desktopmethods, although it is also thoughtthat experts can offer a new and unbiased perspective of a systemand are able to provide valuable insights based on their experienceswith similar products (Rennie, 1981).

4. IVIS usability evaluation methods

A review of over 70 usability evaluation methods was under-taken and thirteen were selected to make up the final evaluation

Page 5: A usability evaluation toolkit for In-Vehicle Information Systems (IVISs)

Table 3Matrix of usability evaluation methods matched with IVIS usability criteria.

IVIS usability criteria Methods

Desktop methods Experimental methods

HTA MultimodalCPA

SHERPA Heuristicanalysis

Layoutanalysis

Measures of primary task interference Secondary task measures

Lateralcontrol

Longitudinalcontrol

Eventdetection

Visualbehaviour

DALI Secondarytask times

Secondarytask errors

SUS

SafetySustained effectiveness x x x x x x x x x x x xSustained efficiency x x x x x x x x x x x xInterference x x x x x x x x x x

Varying environmental conditionsEffectiveness under varying conditions x x x x x x x x x

Range of usersAdaptability xUser compatibility x x x x x x x x x

Training provisionLearnability x x x x x x x x xInitial effectiveness x x x x x x x x xInitial efficiency x x x x x x x x x

Varying frequency of useLong-term and short-term satisfaction xMemorability x x x

UptakeInitial satisfaction xPerceived usefulness x

C.Harvey

etal./

Applied

Ergonomics

42(2011)

563e574

567

Page 6: A usability evaluation toolkit for In-Vehicle Information Systems (IVISs)

Table 4Information outputs, application stage, resources and people required for HTA.

Information A breakdown of the structure of tasks into individualoperations, e.g. move hand to controller, visually locate button.Main use is as a starting point for other methods, although alsouseful in the assessment of efficiency and effectiveness and toexamine if/how tasks are designed to adapt to users.

When to test Early in the design process, as a precursor to otherdesktop methods.

Resources Access to the system under investigation or detailedspecification, paper/pen. A relatively time-consumingmethod (approx. 2e4 h data collection, 6e12 h analysisper IVIS), low associated cost.

People Expert evaluator for data collection and analysis.

Table 6Information outputs, application stage, resources and people required for SHERPA.

Information Predicted error types, error rates, probability and criticality oferrors, error mitigation strategies (i.e. design recommendations).Errors can be used to predict efficiency, effectiveness andinterference between primary and secondary tasks.

When to test Useful for predictions of error at an early stage, although detailedspecification of system and tasks is required to producethe initial HTA.

Resources Access to the system under investigation, paper/pen. A relativelytime-consuming method (approx. 2e4 h data collection,8e16 h analysis per IVIS), low associated cost.

People Expert evaluator for data collection and analysis.

C. Harvey et al. / Applied Ergonomics 42 (2011) 563e574568

toolkit. This selection was based on the suitability of each methodaccording to the four principles of methods selection definedearlier. The suitability of methods to the IVIS taskeuseresysteminteraction, the specific context of use and the set of usabilitycriteria (defined by Harvey et al., in press-a, and discussed inSection 2) was also an important factor in the selection process. It isunlikely that one single method will be capable of reflectinga complete picture of the usability of a particular system and manyauthors recommend using a range of different methods to producethemost comprehensive assessment of usability (e.g. Annett, 2002;Bouchner et al., 2007; Hornbæk, 2006; Kantowitz, 1992). Thethirteen methods selected for inclusion in the IVIS usability eval-uation framework are presented in Table 3 in a matrix whichmatches each method with the usability criteria that it is used tomeasure. Each method is also presented and described in relationto the four method selection principles in the following sections.

4.1. Desktop evaluation methods

Desktop evaluation methods are used to compute symbolicmodels of the taskeuseresystem interaction via paper-based orcomputer-based simulations. These models are used to predict IVISusability parameters such as interaction times and potential errors.Five desktop methods were selected for the IVIS usability evalua-tion framework: hierarchical task analysis (HTA), multimodal crit-ical path analysis (CPA), systematic human error reduction andprediction approach (SHERPA), heuristic analysis and layoutanalysis.

4.1.1. Hierarchical task analysis (HTA)HTA is used to produce an exhaustive description of tasks in

a hierarchical structure of goals, subgoals, operations and plans(Stanton et al., 2005; Hodgkinson and Crawshaw,1985). Operationsdescribe the actions performed by people interacting with a systemor by the system itself (Stanton, 2006) and plans explain theconditions necessary for these operations (Kirwan and Ainsworth,

Table 5Information outputs, application stage, resources and people required for CPA.

Information Predicted task times, modal conflicts, interference fromsecondary tasks. Task times can be used to assess the efficiencyof interaction.

When to test Useful for predictions of usability at an early stage, althoughdetailed specification of system and tasks is required to producethe initial HTA.

Resources Access to the system under investigation or detailedspecification, database of operation times, paper/pen. A relativelytime-consuming method (approx. 2e4 h data collection, 8e16 hanalysis per IVIS), low associated cost.

People Expert evaluator for data collection and analysis.

1992). HTA is a task analysis method and in most cases needs tobe combined with methods of evaluation in order to producemeaningful results (Stanton and Young, 1998; Stanton, 2006). Theimportant features of HTA are summarised in Table 4.

4.1.2. Multimodal critical path analysis (CPA)Multimodal CPA is used to model the time taken to perform

specific tasks and evaluate how this impacts on other related tasksperformed concurrently or subsequently (Baber and Mellor, 2001).For example, it can be used to identify where non-completion ofone task may lead to failure to complete another task (Kirwan andAinsworth, 1992). CPA is useful for the type of multimodal inter-actions created by IVISs because, unlike other task modellingmethods such as the keystroke level model (KLM), it can highlightconflicts between primary and secondary tasks occurring in paralleland in the same mode (Wickens, 1991). The important features ofCPA are summarised in Table 5.

4.1.3. Systematic human error reduction and prediction approach(SHERPA)

SHERPA is a human error identification technique designed toidentify the types of errors that may occur in performing a task, theconsequences of those errors and to generate strategies to preventor reduce the impact of those errors (Baber and Stanton, 1996;Lyons, 2009). SHERPA can be used to predict where IVIS designissues could cause the driver to make errors in secondary taskperformance and to develop design improvements specifically toimprove aspects of IVIS usability. The important features of SHERPAare summarised in Table 6.

4.1.4. Heuristic analysisIn an heuristic analysis experts judge aspects of a system or

device according to a checklist of principles or ‘heuristics’ (Cherriet al., 2004; Nielsen, 1993; Stanton et al., 2005; Stanton and Young,2003). It is usually used to identify usability problems, rather thanto assess potential user performance (Burns et al., 2005; Cherri et al.,2004; Jeffries et al.,1991;NielsenandPhillips,1993).Anadvantageofusing this checklist approach is that evaluators can be guided

Table 7Information outputs, application stage, resources and people required for heuristicanalysis.

Information Estimated performance of the system against a list ofpre-determined usability criteria, list of usability issues.

When to test Any stage, although best applied early in the design process totarget major usability problems.

Resources Access to system or prototype, appropriate usability checklist,pen/paper. Relatively low time demands (approx. 1 hr datacollection, 1 h analysis per IVIS), low associated cost.

People Expert evaluator for data collection and analysis.

Page 7: A usability evaluation toolkit for In-Vehicle Information Systems (IVISs)

Table 8Information outputs, application stage, resources and people required for layoutanalysis.

Information Redesigned layout of menu screens for optimal frequency,importance and sequence of use; number of changes can beused as a quantitative measure. Useful in improving theeffectiveness and efficiency of IVIS menu screens.

When to test Requires knowledge of menu screen layouts from designspecifications or existing/prototype systems. Can be used atany stage because only relatively small designchanges are identified.

Resources Access to detailed specifications of menu screens/existingsystem/prototype, pen/paper. Low-moderate time demands(1e2 h data collection, 1 h analysis per menu screen), lowassociated cost.

People Expert evaluator for data collection and analysis.

Table 10Information outputs, application stage, resources and people required for longitu-dinal driving control measures.

Information Speed and following distances. Poor longitudinal control wouldresult from interference from secondary task interactions, whichcould indicate low levels of effectiveness, efficiency, usercompatibility, learnability and memorability of the IVIS.

When totest

Relatively late in the design process when access to a fullprototype is available.

Resources Access to a full prototype/complete system, testing environment(lab/real world), test vehicle (simulated/real vehicle), equipmentfor recording longitudinal position. High time demands(users are exposed to one or more systems, under one or moretesting conditions), relatively high associated cost.

People Representative sample of user population, experimenters to runuser trials.

C. Harvey et al. / Applied Ergonomics 42 (2011) 563e574 569

towards the aspects of a system which have most influence onusability according to the pre-defined criteria. There are a number ofexisting checklists and guidelines available for use as part of anheuristic analysis (Alliance of Automobile Manufacturers, 2006;Bhise et al., 2003; Commission of the European Communities,2008; Green et al., 1994; Japan Automobile ManufacturersAssociation, 2004; Stevens et al., 2002, 1999; The EuropeanConference of Ministers of Transport, 2003) and each has differentmerits according to the specific features of an evaluation. The prin-ciples for method selection should be used to guide the selection ofappropriate checklists on a case-by-case basis. The importantfeatures of heuristic analysis are summarised in Table 7.

4.1.5. Layout analysisLayout analysis is a technique used to evaluate an existing inter-

face based on the grouping of related functions (Stanton et al., 2005;Stanton and Young, 2003). It can assist in the restructuring of aninterface according to the users’ structure of the task. Functions aregrouped according to three factors: frequency, importance andsequence of use (Stanton et al., 2005). Layout analysis may be usefulin optimising the efficiency of IVIS designs because this will bedependent, inpart, on thephysical layout of buttons andmenu items.The important features of layout analysis are summarised in Table 8.

4.2. Experimental evaluation methods

Experimental methods measure objective and subjective levelsof performance and workload of users interacting with an IVIS.They also evaluate subjective satisfaction and attitudes towardsa particular system. In this evaluation framework, experimentalmethods have been classified as objective or subjective.

Table 9Information outputs, application stage, resources and people required for lateraldriving control measures.

Information Lane keeping and steering measures. Poor lateral control wouldresult from interference from secondary task interactions, whichcould indicate low levels of effectiveness, efficiency, usercompatibility, learnability and memorability of the IVIS.

When totest

Relatively late in the design process when access to a fullprototype is available.

Resources Access to a full prototype/complete system, testing environment(lab/real world), test vehicle (simulated/real vehicle), equipmentfor recording lateral position. High time demands (users areexposed to one or more systems, under one or more testingconditions), relatively high associated cost.

People Representative sample of user population, experimenters torun user trials.

4.2.1. Objective evaluation methodsAn important criterion for IVIS usability relates to the interfer-

ence with primary driving caused by interacting with secondarytasks. Primary task performance can be used as a measure of thisinterference because a driverwho is distracted by the IVIS is likely toexhibit degraded driving performance. This degraded driving can beobjectively measured by recording lateral and longitudinal controland event detection. Visual behaviour is an objectivemeasure of theproportion of time a driver spends looking at the road compared tothe IVIS.Usabilitycanalsobeevaluatedbymeasuring secondary taskperformance. This gives an objective measure of the effectivenessand efficiency of the IVIS under investigation. Comparing theseobjective measures for driving with an IVIS against driving withoutwill indicate the extent to which the usability of the IVIS is inter-fering with primary driving. Two objective measures of secondarytask interaction were selected for inclusion in the framework:secondary task times and secondary task errors. Thesemeasureswillindicate the effectiveness and efficiencywithwhich secondary taskscan be performed via the IVIS. These measures should be comparedacross conditions in which the IVIS is used in isolation and simul-taneously with the driving task.

4.2.1.1. Lateral driving control. Lateral control is an objectivemeasurewhich can be used to evaluate the effects of secondary taskinteraction on primary driving performance (Cherri et al., 2004;Young et al., 2009). When a driver is distracted from the primarytask, particularly by visually demanding secondary tasks, theirability to maintain lateral position on the road is adversely affected(Young et al., 2011, 2009; Wittmann et al., 2006). The importantfeatures of this measure are summarised in Table 9.

Table 11Information outputs, application stage, resources and people required for visualbehaviour measures.

Information Eyes off road time. This is a measure of the visual distractioncaused by secondary tasks.

When to test Relatively late in the design process when access to a fullprototype is available.

Resources Access to a full prototype/complete system, testingenvironment (lab/real world), test vehicle (simulated/real vehicle), equipment for tracking driver eye movements.High time demands (users are exposed to one or more systems,under one or more testing conditions), relatively highassociated cost.

People Representative sample of user population, experimentersto run user trials.

Page 8: A usability evaluation toolkit for In-Vehicle Information Systems (IVISs)

Table 12Information outputs, application stage, resources and people required for eventdetection measures.

Information Number of missed/detected events, incorrect responses, reactiontime/distance. This is a measure of the interference fromsecondary tasks.

When totest

Relatively late in the design process when access to a fullprototype is available.

Resources Access to a full prototype/complete system, testing environment(lab/real world), test vehicle (simulated/real vehicle), equipmentfor measuring event detection/response time, etc. High timedemands (users are exposed to one or more systems, underone or more testing conditions), relatively high associated cost.

People Representative sample of user population, experimenters torun user trials.

Table 14Information outputs, application stage, resources and people required for secondarytask error measures.

Information Number of errors, error types. These measures can be used toevaluate effectiveness, efficiency, user compatibility,learnability and memorability.

When to test Relatively late in the design process when access to a fullprototype is available.

Resources Access to a full prototype/complete system, testingenvironment (lab/real world), test vehicle (simulated/real vehicle), equipment/observer for recording errors.High time demands (users are exposed to one or more systems,under one or more testing conditions), relativelyhigh associated cost.

People Representative sample of user population, experimentersto run user trials.

C. Harvey et al. / Applied Ergonomics 42 (2011) 563e574570

4.2.1.2. Longitudinal driving control. Longitudinal control is anobjective measure relating to the speed of the vehicle (Angell et al.,2006; Cherri et al., 2004; Wittmann et al., 2006). Drivers tend todisplay greater variations in speed and/or reduced speeds whenmanually interacting with a secondary task whilst driving (Younget al., 2009). Longitudinal measures can therefore be used tomeasure the effect of secondary task interaction on drivingperformance. The important features of this measure are sum-marised in Table 10.

4.2.1.3. Visual behaviour. Visual behaviour can be evaluated bymeasuring the amount of time the driver’s eyes are looking at theroad ahead and comparing this to the time spent looking elsewhere(e.g. at the IVIS). This is an objective measure of the interferencecaused by secondary tasks. If the system is visually distracting thenthe driver will spend a significant proportion of the total timelooking at it, rather than at the road (Chiang et al., 2004; Noy et al.,2004). The important features of this measure are summarised inTable 11.

4.2.1.4. Event detection. A driver’s ability to detect and respond toevents and hazards in the driving environment can be used asameasure of the interference from secondary tasks (Liu et al., 2009)as it has been shown to be negatively affected by the use of IVISs(Young et al., 2009; Victor et al., 2009). Event detection can bemeasured via the number of missed events compared to detectedevents, the number of incorrect responses to events, the responsetime and reaction distance (Young et al., 2009). The importantfeatures of this measure are summarised in Table 12.

4.2.1.5. Secondary task times. Monitoring the time a user takes toperform a secondary task gives an objective measure of the time

Table 13Information outputs, application stage, resources and people required for secondarytask time measures.

Information Total task times, individual operation times. These measurescan be used to evaluate effectiveness, efficiency, interference,user compatibility, learnability and memorability.

When to test Relatively late in the design process when access to a fullprototype is available.

Resources Access to a full prototype/complete system, testingenvironment (lab/real world), test vehicle (simulated/realvehicle), equipment for recording task/operation times. Hightime demands (users are exposed to one or more systems,under one or more testing conditions), relatively highassociated cost.

People Representative sample of user population, experimentersto run user trials.

spent away from the primary task, i.e. attending to the road ahead.The more time spent on the secondary task, the less time availablefor attention to driving and therefore the higher the risk to safedriving (Green, 1999). Task time can also be a measure of theeffectiveness and efficiency of the interaction enabled by the in-vehicle device (Noy et al., 2004): themore time required to performa secondary task, the less effective and efficient the interface is. Theimportant features of this measure are summarised in Table 13.

4.2.1.6. Secondary task errors. The number and types of errors inthe interaction with an IVIS can be used to evaluate the effective-ness of the system design. Errors include pressing incorrect buttonsand selecting incorrect functions. Task time comparedwith numberof errors is a useful objective measure of efficiency because itprovides information about the quality of the interaction. A usableproduct will be one which, among other things, enables relativelylow task completion times combined with minimum errors. Theimportant features of this measure are summarised in Table 14.

4.2.2. Subjective evaluation methodsSubjective methods are used to evaluate the users’ perceptions

of their primary and secondary task performance and their atti-tudes towards the IVIS under investigation. Workload can indicatethe level of interference caused by interacting with an IVIS.Workload can be measured subjectively, based on self-ratings fromusers. The level of systemusability, with particular reference to usersatisfaction, has to be measured subjectively by asking users to ratetheir experiences with a product.

4.2.2.1. Driving activity load index (DALI). DALI is a method formeasuring users’ subjective workload. It is based on the NASA-TLX

Table 15Information outputs, application stage, resources and people required for DALI.

Information Users’ subjective ratings of six aspects of perceived workload.Workload can indicate the effectiveness and efficiency of taskperformance, primary/secondary task interference, compatibilityof the system with different users, and learnability.

When totest

Relatively late in the design process when access to a fullprototype is available.

Resources Access to a full prototype/complete system, testingenvironment (lab/real world), test vehicle (simulated/real vehicle), questionnaire, recording material. High timedemands (users are exposed to one or more systems, underone or more testing condition, then need to answer thequestionnaire for each condition), relativelyhigh associated cost.

People Representative sample of the user population, experimentersto run user trials and administer the questionnaire.

Page 9: A usability evaluation toolkit for In-Vehicle Information Systems (IVISs)

Table 16Information outputs, application stage, resources and people required for SUS.

Information Users’ subjective ratings of ten aspects of system usability.The ten SUS rating scales cover many aspects of the usabilityof in-vehicle devices and are particularly useful in addressingthe issue of uptake, which can only be evaluated subjectively.

When to test Mid-late in the design process when access to a part/full prototype is available.

Resources Access to a part/full prototype/complete system, questionnaire,recording materials. SUS can be used to evaluate an IVIS inisolation or situated in the vehicle. May also require testingenvironment (lab/real world), test vehicle (simulated/real vehicle). Medium-high time demands (depending on thetest set-up, users may be exposed to one or more systems,under one or more testing conditions, then answer thequestionnaire for each condition), relativelyhigh associated cost.

People Representative sample of user population, experimentersto run user trials and administer questionnaire.

Fig. 2. IVIS usability eva

C. Harvey et al. / Applied Ergonomics 42 (2011) 563e574 571

workload measurement scale and is designed specifically for thedriving context (Pauzié, 2008). Unlike NASA-TLX, DALI includesa rating for interference with the driver’s state caused by interac-tion with a supplementary task. Participants are asked to rate thetask, post-trial, along six rating scales: effort of attention, visualdemand, auditory demand, temporal demand, interference andsituational stress (Johansson et al., 2004). The important features ofDALI are summarised in Table 15.

4.2.2.2. System usability scale (SUS). The system usability scale isa subjective method of evaluating users’ attitudes towardsa system, consisting of ten statements against which participantsrate their level of agreement on a 5-point Likert scale (Brooke,1996). A single usability score is computed from the ratings andthis is useful for comparing participants’ views across differentsystems (Bangor et al., 2008). Examples of statements include ‘Ineeded to learn a lot of things before I could get going with thissystem’ and ‘I think I would like to use this system frequently’.These cover two of the main criteria for IVIS usability: learnability

luation framework.

Page 10: A usability evaluation toolkit for In-Vehicle Information Systems (IVISs)

Table 17Quantitative comparisons between two IVISs based on the results of five desktop evaluation methods.

Method Touch screen Remote controller Best performance?

HTA 125 total operations 113 total operations Remote controllerCPA 63080 ms total task time 78430 ms total task time Touch screenSHERPA 6 highly rated errors 7 highly rated errors Touch screenHeuristic analysis 13 eive/7 þive issues 11 eive/8 þive issues Remote controllerLayout analysis 11 layout changes across

two menu screens18 layout changes across twomenu screens

Touch screen

C. Harvey et al. / Applied Ergonomics 42 (2011) 563e574572

and satisfaction (Brooke, 1996). SUS is applicable to a wide range ofinterface technologies and is a quick and easy method to use(Bangor et al., 2008). The important features of SUS are summarisedin Table 16.

5. Evaluation framework

A framework for IVIS usability evaluation is illustrated in Fig. 2. Itcomprises a number of stepswhich are linked in an iterative cycle ofdesigneevaluationeredesign with the overall aim of improving theusability of IVISs. The framework begins with a need, which isdefined in this case as a usable IVIS, which enhances the overalldriving experience whilst meeting the needs of the driver. Next, thecriteria which would need to be met to achieve this goal of a usableIVISweredefined.Harveyet al. (inpress-a) stated that anydefinitionofusabilityhad tobe context specific; inotherwords, criterianeededto be developed for the particular tasks, users and systems whichwere the subject of the investigation. This was done for the contextof IVISs to define thirteen usability criteria (Harvey et al., in press-a).These usability criteria prescribe the type of information that isneeded in order to evaluate the usability of an IVIS and thiswas usedto guide the selection of methods which were most appropriate forevaluating usability. In the framework, methods have been cat-egorised as desktop or experimental and the individual methods ineach category have been presented and discussed in this paper.Desktop methods generally have low time and resource demands,making them more applicable earlier in the evaluation and designprocess. It is recommended that desktop methods are applied topredict IVIS usability in order to determine if a particular design isworth developing further. Systems which are predicted to performwell against the IVIS usability criteria can then be taken forward intothe next stage of the framework in which experimental evaluationmethods are applied. These evaluation stages should then berepeated where necessary in order to refine the design of an IVISuntil the usability criteria are met. This iteration validates the find-ings of each stage of the framework, ensuring that the evaluationprocess is capable of measuring what it is supposed to measure.

5.1. Evaluation validity

There are four types of validity illustrated in the evaluationframework in Fig. 2: construct, content, predictive and concurrent(Diaper and Stanton, 2004; Stanton and Young, 1999a). Theappropriateness of methods to the thirteen IVIS usability criteriawill determine the construct validity of the evaluation. The selec-tion of methods may need to be revised according to the resultsobtained and their significance to the usability criteria. The resultsof applying the methods in an evaluationwill also indicate the levelof content validity in the process because the usability of themethods themselves will be assessed. Comparing the resultsobtained from applying desktop and evaluation methods willdetermine the concurrent and predictive validity of the particularselection of methods, and this selection may need to be revised infurther iterations of the framework to increase validity. Thefollowing section presents a case study which describes the

application of desktop methods as part of this IVIS usabilityframework.

6. Case study: desktop evaluation

The five desktop methods selected for the IVIS usability evalu-ation framework (HTA, multimodal CPA, SHERPA, heuristic analysisand layout analysis) were used to evaluate two existing IVISs. Thesewere a touch screen system and a remote controller with a sepa-rately located display screen. The touch screen IVIS consisted ofa screen which was located within reach of the driver and alloweddirect inputs onto the display. The remote controller systemcomprised of a screen located at the driver’s eye level in the centreconsole and a separate controller which functioned in a similar wayto a joystick, moving a pointer around the screen. Data on the twoIVISs was collected by an expert evaluator based on a set of tasksperformed with each system. The results were quantified andcompared in order to identify which system had better usabilityaccording to each of the evaluation methods. The results of thisquantitative analysis are shown in Table 17.

These results showed that the methods were able to discrimi-nate between the two IVIS; for example, according to the multi-modal CPA, the touch screen enabled shorter task times than theremote controller and SHERPA predicted that the remote controllerwould produce fewer errors than the touch screen. The findingsshowed that HTA and CPA were most useful for generating objec-tive data with which to make direct comparisons between theusability of the two systems. SHERPA, heuristic analysis and layoutanalysis on the other hand, proved more valuable as tools forgenerating possible design improvements and required a moresubjective interpretation.

This particular set of five desktopmethods has been shown to becapable of predicting various aspects of usability, such as effective-ness, efficiency, learnability, and performance under varying envi-ronmental conditions, demonstrating high construct validity. Interms of content validity, themethods had all been used in previousstudies and had credibility in existing literature (see for example:Annett, 2004; Baber and Stanton, 1996; Baber, 2005; Baber andMellor, 2001; Harris et al., 2005; Kirwan and Ainsworth, 1992;Kirwan, 1992; Nielsen, 1992; Stanton and Young, 2003, 1999a;Stanton et al., 2005; Stanton and Baber, 2008). Next, experimentalmethods will be applied to an evaluation of IVIS to assess thepredictive and concurrent validity of the proposed framework.

7. Conclusion

The main aim of this project was to develop a framework toguide the evaluation of usability of IVISs. In developing thisframework we have learnt a number of valuable lessons which willbe of interest to anyone attempting to develop and apply their ownevaluation process to a particular system. First, a definition ofusability is necessary to provide an overall goal for the evaluation.This definition must be specific to the context of use for the systemin question, which in this case was IVISs. This definition thenprovides a foundation from which to develop a set of usability

Page 11: A usability evaluation toolkit for In-Vehicle Information Systems (IVISs)

C. Harvey et al. / Applied Ergonomics 42 (2011) 563e574 573

criteria, against which specific aspects of usability can be evaluated.Next, evaluation methods need to be selected according to theirsuitability to the usability criteria for the particular system underinvestigation. A literature review of evaluation methods and themethod selection process was conducted as part of this study andfour general principles for method selection were identified. Thesehave been presented here as a useful guide to selecting appropriateevaluation methods according to the type of information required,the stage of application, the resources available and the personnelinvolved in the evaluation. The focus of this evaluation is IVISs andthirteen methods were selected as appropriate for these systems.These thirteen evaluation methods have been presented and dis-cussed in this paper. Five of these methods have already beenapplied in a desktop evaluation of two IVISs, and the results of thisevaluation have also been presented. The results of this evaluationhave shown that the five desktop methods have suitable levels ofconstruct and content validity. Further work will include theapplication of eight experimental methods to the evaluation ofIVISs, which will validate the predictions made by the desktopmethods. Finally, a framework consisting of desktop and experi-mental methods, validated and refined in an iterative process, hasbeen developed to illustrate the process which evaluators shouldfollow to ensure a comprehensive evaluation of usability. Thisstructured process will enable the users of this framework toimprove the usability of IVISs and enhance the overall drivingexperience.

Acknowledgement

This research is sponsored by Jaguar Cars and the Engineeringand Physical Sciences Research Council (EPSRC). The authors wouldlike to thank the two anonymous reviewers for their constructivecomments on earlier versions of this article.

References

Alliance of Automobile Manufacturers, 2006. Statement of Principles, Criteria andVerification Procedures on Driver Interactions with Advanced In-vehicleInformation and Communication Systems. AAM, Washington DC.

Amditis, A., Pagle, K., Joshi, S., Bekiaris, E., 2010. Driver-vehicle-environmentmonitoring for on-board driver support systems: lessons learned from designand implementation. Appl. Ergon. 41, 225e235.

Angell, L., Auflick, J., Austria, P.A., Kochhar, D., Tijerina, L., Biever, W., Diptiman, T.,Hogsett, J., Kiger, S., 2006. Driver Workload Metrics Task 2 Final Report. CrashAvoidance Metrics Partnership, Farmington Hills, MI.

Annett, J., 2002. Subjective rating scales: science or art? Ergonomics 45, 966e987.Annett, J., 2004. Hierarchical task analysis. In: Diaper, D., Stanton, N.A. (Eds.), The

Handbook of Task Analysis for HumaneComputer Interaction. Lawrence Erl-baum, London, pp. 67e82.

Au, F.T.W., Baker, S., Warren, I., Dobbie, G., 2008. Automated Usability TestingFramework. In: 9th Australasian User Interface Conference, Wollongong,Australia, 22e25 January, 2008.

Baber, C., Mellor, B., 2001. Using critical path analysis to model multimodalhumanecomputer interaction. Int. J. Hum. Comput. St. 54, 613e636.

Baber, C., Stanton, N.A.,1996. Human error identification techniques applied to publictechnology: predictions compared with observed use. Appl. Ergon. 27, 119e131.

Baber, C., 2005. Critical path analysis for multimodal activity. In: Stanton, N.,Hedge, A., Brookhuis, K., Salas, E., Hendrick, H. (Eds.), Handbook of HumanFactors and Ergonomics Methods. CRC Press, London, pp. 41-1e41-8.

Bangor, A., Kortum, P.T., Miller, J.T., 2008. An empirical evaluation of the systemusability scale. Int. J. Hum. Comput. Int. 24, 574e594.

Bevan, N., 2001. International standards for HCI and usability. Int. J. Hum. Comput.St. 55, 532e552.

Bhise, V.D., Dowd, J.D., Smid, E., 2003. A Comprehensive HMI Evaluation Process forAutomotive Cockpit Design. SAE World Congress, Detroit, MI. March 3e6, 2003.

Bouchner, P., Novotný, S., Piekník, R., 2007. Objective methods for assessments ofinfluence of IVIS (in-vehicle information systems) on safe driving. In: 4thinternational driving symposium on human factors in driver assessment,training and vehicle design, Stevenson, WA, 9e12 July, 2007.

Brooke, J., 1996. SUS: a ’quick and dirty’ usability scale. In: Jordan, P.W., Thomas, B.,Weerdmeester, B.A., McClelland, I.L. (Eds.), Usability Evaluation in Industry.Taylor and Francis, London, pp. 189e194.

Burns, P.C., Trbovich, P.L., Harbluk, J.L., McCurdie, T., 2005. Evaluating one screen/one control multifunction devices in vehicles. 19th international technicalconference on the enhanced safety of vehicles, Washington, DC, 6e9 June, 2005.

Butler, K.A., 1996. Usability engineering turns 10. Interactions, 58e75.Butters, L.M., Dixon, R.T., 1998. Ergonomics in consumer product evaluation: an

evolving process. Appl. Ergon. 29, 55e58.Cellario, M., 2001. Human-centered intelligent vehicles: toward multimodal inter-

face integration. IEEE Intell. Trans. Syst. 16, 78e81.Chamorro-Koc, M., Popovic, V., Emmison, M., 2008. Human experience and product

usability: principles to assist the design of usereproduct interactions. Appl.Ergon. 40, 648e656.

Cherri, C., Nodari, E., Toffetti, A., 2004. Review of Existing Tools and Methods: AIDEDeliverable 2.1.1. VTEC, Gothenburg.

Chiang, D.P., Brooks, A.M., Weir, D.H., 2004. On the highway measures of driverglance behavior with an example automobile navigation system. Appl. Ergon.35, 215e223.

Commission of the European Communities, 2008. Commission Reccomendation of26/V/2008 on Safe and Efficient In-vehicle Information and CommunicationSystems: Update of the European Statement of Principles on HumaneMachineInterface Commission of the European Communities, Brussels.

Daimon, T., Kawashima, H., 1996. New viewpoints for the evaluation of in-vehicleinformation systems: applying methods in cognitive engineering. JSAE Rev. 17,151e157.

Diaper, D., Stanton, N.A., 2004. Wishing on a sTAR: the future of task analysis. In: Dia-per, D., Stanton, N.A. (Eds.), The Handbook of Task Analysis for HumaneComputerInteraction. Lawrence Erlbaum, London, pp. 603e619.

Doubleday, A., Ryan, M., Springett, M., Sutcliffe, A., 1997. A comparison of usabilitytechniques for evaluating design. In: Designing Interactive Systems Conference,Amsterdam, 18e20 August, 1997.

Fastrez, P., Haué, J.-B., 2008. Editorial: designing and evaluating driver supportsystems with the user in mind. Int. J. Hum. Comput. St. 66, 125e131.

Gould, J.D., Lewis, C., 1985. Designing for usability: key principles and whatdesigners think. In: Denning, P.J. (Ed.), Communications of the ACM. ACM, NewYork, pp. 300e311.

Green, P., Levison, W., Paelke, G., Serafin, C., 1994. Suggested Human Factors DesignGuidelines for Driver Information Systems. The University of Michigan Trans-portation Research Institute (UMTRI), Ann Arbor, Michigan.

Green, P., 1999. The 15-second rule for driver information systems. In: ITS Americaninth annual meeting, Washington, DC, April, 1999.

Greenberg, S., Buxton, B., 2008. Usability evaluation considered harmful (some ofthe time). In: 26th conference on human factors in computing systems, Flor-ence, 5e10 April, 2008.

Harris, D., Stanton, N.A., Marshall, A., Young, M.S., Demagalski, J., Salmon, P., 2005.Using SHERPA to predict design-induced error on the flight deck. Aerosp. Sci.Technol. 9, 525e532.

Harvey, C., Stanton, N.A., Pickering, C.A., McDonald, M., Zheng, P., Context of use asa factor in determining the usability of in-vehicle devices. Theor. Issues Ergo-nomics Sci., in press-a.

Harvey, C., Stanton, N.A., Pickering, C.A., McDonald, M., Zheng, P., In-vehicle infor-mation systems tomeet the needs of drivers. Int. J. Hum. Comput. Int., in press-b.

Hewett, T.A., 1986. The role of iterative evaluation in designing systems for usability.In: Harrison, M.D., Monk, A.F. (Eds.), People and Computers: Designing forUsability. Proceedings of the Second Conference of the British Computer SocietyHuman Computer Special Interest Group. University of York 23e26 September,1986.

Hodgkinson, G.P., Crawshaw, C.M., 1985. Hierarchical task analysis for ergonomicsresearch. Appl. Ergon. 16, 289e299.

Hornbæk, K., 2006. Current practice in measuring usability. Int. J. Hum. Comput. St.64, 79e102.

International Organization for Standardization, 1998. ISO 9241. ErgonomicRequirements for Office Work with Visual Display Terminals (VDTs) e Part 11:Guidance on Usability.

Japan Automobile Manufacturers Association, 2004. Guideline for In-vehicleDisplay Systems, Version 3.0. JAMA, Tokyo.

Jeffries, R., Miller, J.R., Wharton, C., Uyeda, K.M., 1991. User interface evaluation inthe real world: a comparison of four techniques. In: 8th Conference on HumanFactors in Computing Systems, Pittsburgh, Pennsylvania, 15e20 May, 1999.

Johansson, E., Engström, J., Cherri, C., Nodari, E., Toffetti, A., Schindhelm, R.,Gelau, C., 2004. Review of Existing Techniques and Metrics for IVIS and ADASAssessment. Volvo Technology Corporation, Gothenberg.

Johnson, G.I., Clegg, C.W., Ravden, S.J., 1989. Towards a practical method of userinterface evaluation. Appl. Ergon. 20, 255e260.

Kantowitz, B.H., 1992. Selecting measures for human factors research. Hum. Factors34, 387e398.

Kirwan, B., Ainsworth, L.K., 1992. A Guide to Task Analysis, first ed. Taylor andFrancis, London.

Kirwan, B., 1992. Human error identification in human reliability assessment, part1: overview of approaches. Appl. Ergon. 23, 299e318.

Kontogiannis, T., Embrey, D., 1997. A user-centred design approach for introducingcomputer-based process information systems. Appl. Ergon. 28, 109e119.

Kwahk, J., Han, S.H., 2002. A methodology for evaluating the usability of audiovisualconsumer electronic products. Appl. Ergon. 33, 419e431.

Landauer, T.K., 1997. Behavioral researchmethods in human-computer interaction. In:Helander, M., Landauer, T.K., Prabhu, P. (Eds.), Handbook of HumaneComputerInteraction. Elsevier Science, Oxford, pp. 203e227.

Page 12: A usability evaluation toolkit for In-Vehicle Information Systems (IVISs)

C. Harvey et al. / Applied Ergonomics 42 (2011) 563e574574

Lansdown, T.C., 2000. Driver visual allocation and the introduction of intelligenttransport systems. P. I. Mech. Eng. D-J. Aut. 214, 645e652.

Liu, Y.-C., Bligh, T., Chakrabarti, A., 2003. Towards an ’ideal’ approach for conceptgeneration. Design Stud. 24, 341e355.

Liu, C.-C., Doong, J.-L., Hsu, C.-C., Huang, W.-S., Jeng, M.-C., 2009. Evidence for theselective attention mechanism and dual-task interference. Appl. Ergon. 40,341e347.

Lyons, M., 2009. Towards a framework to select techniques for error prediction:supporting novice users in the healthcare sector. Appl. Ergon. 40, 379e395.

McClelland, I., 1991. Product assessment and user trials. In: Wilson, J.R., Corlett, E.N.(Eds.), Evaluation of Human Work: A Practical Ergonomics Methodology. Taylorand Francis, London, pp. 218e247.

Nielsen, J., Phillips, V.L., 1993. Estimating the relative usability of two interfaces:heuristic, formal, and empirical methods compared. In: Ashlund, S., Mullet, K.,Henderson, A., Hollnagel, E., White, T. (Eds.), 11th Conference on Human Factorsin Computing Systems, Amsterdam, 24e29 April, 1993.

Nielsen, J., 1992. Finding usability problems through heuristic evaluation. In:Conference on human factors in computing systems, Monterey, California, 3e7May, 1992.

Nielsen, J., 1993. Usability Engineering, first ed. Academic Press, London.Norman, D.A., 1983. Design Principles for Human Computer Interfaces. In: Janda, A.

(Ed.), Proceedings of the SIGCHI Conference on Human Factors in ComputingSystems, Boston, MA, 12e15 December, 1983.

Norman, D.A., 2002. The Design of Everyday Things, first ed. Basic Books, New York.Noy, Y.I., Lemoine, T.L., Klachan, C., Burns, P.C., 2004. Task interruptability and

duration as a measure of visual distraction. Appl. Ergon. 35, 207e213.Pauzié, A., 2008. Evaluating driver mental workload using the driving activity load

index (DALI). In: European Conference on Human Centred Design for IntelligentTransport Systems, Lyon, 3e4 April, 2008.

Preece, J., Rogers, Y., Sharp, H., 2002. Interaction Design: Beyond HumaneComputerInteraction, first ed. John Wiley and Sons, New Jersey.

Rennie, A.M., 1981. The application of ergonomics to consumer product evaluation.Appl. Ergon. 12, 163e168.

Santos, J., Merat, N., Mouta, S., Brookhuis, K., de Waard, D., 2005. The interactionbetween driving and in-vehicle information systems: comparison of results fromlaboratory, simulator and real-world studies. Transport. Res. F-Traf. 8, 135e146.

Sauer, J., Sonderegger, A., 2009. The influence of prototype fidelity and aesthetics ofdesign in usability tests: effects on user behaviour, subjective evaluation andemotion. Appl. Ergon. 40, 670e677.

Sauer, J., Seibel, K., Rüttinger, B., 2010. The influence of user expertise and prototypefidelity in usability tests. Appl. Ergon. 41, 130e140.

Shackel, B., 1986. Ergonomics in design for usability. In: Harrison, M.D., Monk, A.F.(Eds.), 2nd Conference of the British Computer Society Human ComputerSpecial Interest Group, York, 23e26 September, 1986.

Shneiderman, B., 1992. Designing the User Interface: strategies for EffectiveHuman-computer Interaction, second ed. Addison-Wesley, New York.

Stanton,N.A., Baber,C.,2008.Modellingofhumanalarmhandlingresponse times:acasestudy of the Ladbroke Grove rail accident in the UK. Ergonomics 51, 423e440.

Stanton, N.A., Salmon, P.M., 2009. Human error taxonomies applied to driving:a generic driver error taxonomy and its implications for intelligent transportsystems. Saf. Sci 47, 227e237.

Stanton, N.A., Young, M.S., 1998. Is utility in the mind of the beholder? A study ofergonomics methods. Appl. Ergon. 29, 41e54.

Stanton, N.A., Young, M.S., 1999a. A Guide to Methodology in Ergonomics:Designing for Human Use, first ed. Taylor and Francis, London.

Stanton, N.A., Young, M.S., 1999b. What price ergonomics? Nature, 197e198.Stanton, N.A., Young, M.S., 2003. Giving ergonomics away? The application of

ergonomics methods by novices. Appl. Ergon. 34, 479e490.Stanton, N.A., Young, M., McCaulder, B., 1997. Drive-by-wire: the case of driver work-

load and reclaiming control with adaptive cruise control. Saf. Sci 27, 149e159.Stanton, N.A., Salmon, P.M., Walker, G.H., Baber, C., Jenkins, D.P., 2005. Human

Factors Methods: A Practical Guide for Engineering and Design, first ed. Ash-gate, Aldershot.

Stanton, N.A., 2006. Hierarchical task analysis: developments, applications, andextensions. Appl. Ergon. 37, 55e79.

Stevens, A., Board, A., Allen, P., Quimby, A., 1999. A Safety Checklist for theAssessment of In-vehicle Information Systems: A User’s Manual. TransportResearch Laboratory, London.

Stevens, A., Quimby, A., Board, A., Kersloot, T., Burns, P., 2002. Design Guidelines forSafety of In-vehicle Information Systems. Transport Research Laboratory, London.

Sweeney, M., Maguire, M., Shackel, B., 1993. Evaluating user-computer interaction:a framework. Int. J. Man Mach. Stud. 38, 689e711.

The European Conference of Ministers of Transport, 2003. Statement of Principles ofGood Practice Concerning the Ergonomics and Safety of In-vehicle InformationSystems. ECMT, Leipzig.

Victor, T.W., Engström, J., Harbluk, J.L., 2009. Distraction assessmentmethods based onvisual behavior and event detection. In: Regan, M.A., Lee, J.D., Young, K.L. (Eds.),Driver Distraction: Theory, Effects andMitigation. CRC Press, Florida, pp.135e165.

Walker, G.H., Stanton, N.A., Young, M.S., 2001. Hierarchical task analysis of driving:a new research tool. In: Hanson, M. (Ed.), Contemporary Ergonomics 2001.Taylor and Francis, London, pp. 435e440.

Wickens, C.D., 1991. Processing resources and attention. In: Damos, D.L. (Ed.),Multiple-task Performance. Taylor and Francis, London, pp. 3e34.

Wittmann, M., Kiss, M., Gugg, P., Steffen, A., Fink, M., Pöppel, E., Kaniya, H., 2006.Effect of display position of a visual in-vehicle task on simulated driving. Appl.Ergon. 37, 187e199.

Young, K.L., Regan, M.A., Lee, J.D., 2009. Measuring the effects of driver distraction:direct driving performance methods and measures. In: Regan, M.A., Lee, J.D.,Young, K.L. (Eds.), Driver Distraction: Theory, Effects and Mitigation. CRC Press,Florida, pp. 85e105.

Young, K.L., Lenné, M.G., Williamson, A.R., 2011. Sensitivity of the lane change testas a measure of in-vehicle system demand. Appl. Ergon 42, 611e618.