Usability testing:what have we overlooked

Happy Earth Day!

Usability Testing: What Have We Overlooked?

By Gitte Lindgaard and Jarinee Chattratichart

Presented by Veronica Nixon 4/22/08

About the authors

Lindgaard

-Human-Oriented Technology Lab, Ottawa

-Psychology-Human decision making-HCI for product design

Chattratichart

-Senior lecturer, Faculty of Computing, Information Systems and Mathematics, Kingston University, UK

-Web usability issues

Overview

Usability testing: discovery of usability problems with a product through simulated usability tasks

Claim: User task coverage is more important than number of participants in predicting proportion of problems detected.

Bibliography

Lots of work has been done on the recruitment size problem: Benefits of increased sample sizes in usability

testing (2003) Why you only need to test with 5 users (2000) Five users is nowhere near enough (2001) Refining the test phase of usability evaluation:

How many subjects is enough? (1992)

Research questions-Novel?

1. Is there a correlation between number of participants and proportion of problems found? Old problem, novel approach

2. Is there a correlation between number of user tasks and proportion of problems found? No citations about this in bibliography

History of the sample size problem

1992 - Proportion of problems found = 1-(1-probability of detecting a problem)# participants

1993 – Number of problems found = Number of problems present(1-(1-probability of detecting a problem)# participants) Same formula?

1998 – Number of problems found = the geometric mean of the number of evaluators and the number of participants

2000 - “Magic Number 5” If probability of detection = 0.31 And detecting 80% of problems is OK

But other studies only detected 35% and 55% of problems.

Why does none of this depend on the type of participant?

History of the problem











2000 - “Magic Number 5” Assumes problems are equally likely to be detected Works if probability of detection = 0.31 And detecting 80% of problems is OK








But other studies only detected 35% and 55% of problems and showed that the large variability in tasks and user characteristics makes the above assumptions inaccurate.

Given that the probability of detecting one problem varies wildly depending on the type of problem, how can you just assume a 31% probability?

And why does none of this depend on the type of participant?

An observation

The authors could have used the literature review to back up their assumption that HCI is “focused” on participant studies.

CUE-1 CUE-2 CUE-3 CUE-4 CUE-5 CUE-6

http://www.hotelpenn.com/ 17 independent evaluation teams March 2003 Other CUE projects have evaluated

Enterprise Rent-A-Car, Ikea, Avis, Hotmail and Windows

http://www.hotelpenn.com/

Methods

Correlation analysis of outcomes of usability tests conducted by 9 different test teams.

Controlled variables: All were experienced testers All evaluated Hotel Pennsylvania’s website’s OneScreen

reservation system All used think-aloud method All completed at about the same time Same evaluation objectives All used the same reporting format

Methods

1. Identified number of participants in each study

2. User tasks were listed out and grouped together using affinity analysis

3. Problem analysisA. CategorizationB. TokenizationC. Affinity analysisD. Only severe, unique problems retained

Results

No correlation found between number of test users and number of problems and new problems found

Correlation found between task coverage and number of problems and new problems found

Heterogeneity and representativeness of participants together with number of participants and number of user tasks seems to lead to higher problem detection

Discussion

Though all teams had >5 participants, number of problems found ranged only between 7% and 43%. This doesn’t support the Magic Number 5 convention.

The formula 1-(1-p)#n assumes that problems are equally detectable and randomly discovered. Type of application, type of test user and test team experience vary widely and may effect the detectability of problems.

Participants who “get in character” may find more problems

Overall, careful attention to task design and user recruitment cannot entirely account for variation in problem discovery.

However, for optimum return on investment, it is wiser to invest more time designing more user tasks than to recruit more participants.

Future research

User task types

Participant recruitment

Test user personas

Interactions between these variables

Discussion questions: readability

Results given up front

Bad sentences: “…high enough to reveal most…problems, but not too high to keep the running costs manageable.”

?

Class discussion: assumptions

HCI has paid too much attention to the problem of #participants over the last 15 years.

?

Discussion questions

Credibility Applicability Generalizability Scalability

The End

Usability testing:what have we overlooked

Education

Transcript of Usability testing:what have we overlooked