Usability testing:what have we overlooked

22
Happy Earth Day!

Transcript of Usability testing:what have we overlooked

Page 1: Usability testing:what have we overlooked

Happy Earth Day!

Page 2: Usability testing:what have we overlooked

Usability Testing: What Have We Overlooked?

By Gitte Lindgaard and Jarinee Chattratichart

Presented by Veronica Nixon 4/22/08

Page 3: Usability testing:what have we overlooked

About the authors

Lindgaard

-Human-Oriented Technology Lab, Ottawa

-Psychology-Human decision making-HCI for product design

Chattratichart

-Senior lecturer, Faculty of Computing, Information Systems and Mathematics, Kingston University, UK

-Web usability issues

Page 4: Usability testing:what have we overlooked

Overview

Usability testing: discovery of usability problems with a product through simulated usability tasks

Claim: User task coverage is more important than number of participants in predicting proportion of problems detected.

Page 5: Usability testing:what have we overlooked

Bibliography

Lots of work has been done on the recruitment size problem: Benefits of increased sample sizes in usability

testing (2003) Why you only need to test with 5 users (2000) Five users is nowhere near enough (2001) Refining the test phase of usability evaluation:

How many subjects is enough? (1992)

Page 6: Usability testing:what have we overlooked

Research questions-Novel?

1. Is there a correlation between number of participants and proportion of problems found? Old problem, novel approach

2. Is there a correlation between number of user tasks and proportion of problems found? No citations about this in bibliography

Page 7: Usability testing:what have we overlooked

History of the sample size problem

1992 - Proportion of problems found = 1-(1-probability of detecting a problem)# participants

1993 – Number of problems found = Number of problems present(1-(1-probability of detecting a problem)# participants) Same formula?

1998 – Number of problems found = the geometric mean of the number of evaluators and the number of participants

2000 - “Magic Number 5” If probability of detection = 0.31 And detecting 80% of problems is OK

But other studies only detected 35% and 55% of problems.

Why does none of this depend on the type of participant?

Page 8: Usability testing:what have we overlooked

History of the problem

1992 - Proportion of problems found = 1-(1-probability of detecting a problem)# participants

1993 – Number of problems found = Number of problems present(1-(1-probability of detecting a problem)# participants) Same formula?

1998 – Number of problems found = the geometric mean of the number of evaluators and the number of participants

2000 - “Magic Number 5” If probability of detection = 0.31 And detecting 80% of problems is OK

But other studies only detected 35% and 55% of problems.

Why does none of this depend on the type of participant?

Page 9: Usability testing:what have we overlooked

History of the problem

1992 - Proportion of problems found = 1-(1-probability of detecting a problem)# participants

1993 – Number of problems found = Number of problems present(1-(1-probability of detecting a problem)# participants) Same formula?

1998 – Number of problems found = the geometric mean of the number of evaluators and the number of participants

2000 - “Magic Number 5” If probability of detection = 0.31 And detecting 80% of problems is OK

But other studies only detected 35% and 55% of problems.

Why does none of this depend on the type of participant?

Page 10: Usability testing:what have we overlooked

History of the problem

1992 - Proportion of problems found = 1-(1-probability of detecting a problem)# participants

1993 – Number of problems found = Number of problems present(1-(1-probability of detecting a problem)# participants) Same formula?

1998 – Number of problems found = the geometric mean of the number of evaluators and the number of participants

2000 - “Magic Number 5” Assumes problems are equally likely to be detected Works if probability of detection = 0.31 And detecting 80% of problems is OK

But other studies only detected 35% and 55% of problems.

Why does none of this depend on the type of participant?

Page 11: Usability testing:what have we overlooked

History of the problem

1992 - Proportion of problems found = 1-(1-probability of detecting a problem)# participants

1993 – Number of problems found = Number of problems present(1-(1-probability of detecting a problem)# participants) Same formula?

1998 – Number of problems found = the geometric mean of the number of evaluators and the number of participants

2000 - “Magic Number 5” If probability of detection = 0.31 And detecting 80% of problems is OK

But other studies only detected 35% and 55% of problems and showed that the large variability in tasks and user characteristics makes the above assumptions inaccurate.

Given that the probability of detecting one problem varies wildly depending on the type of problem, how can you just assume a 31% probability?

And why does none of this depend on the type of participant?

Page 12: Usability testing:what have we overlooked

An observation

The authors could have used the literature review to back up their assumption that HCI is “focused” on participant studies.

Page 13: Usability testing:what have we overlooked

CUE-1 CUE-2 CUE-3 CUE-4 CUE-5 CUE-6

http://www.hotelpenn.com/ 17 independent evaluation teams March 2003 Other CUE projects have evaluated

Enterprise Rent-A-Car, Ikea, Avis, Hotmail and Windows

Page 14: Usability testing:what have we overlooked

Methods

Correlation analysis of outcomes of usability tests conducted by 9 different test teams.

Controlled variables: All were experienced testers All evaluated Hotel Pennsylvania’s website’s OneScreen

reservation system All used think-aloud method All completed at about the same time Same evaluation objectives All used the same reporting format

Page 15: Usability testing:what have we overlooked

Methods

1. Identified number of participants in each study

2. User tasks were listed out and grouped together using affinity analysis

3. Problem analysisA. CategorizationB. TokenizationC. Affinity analysisD. Only severe, unique problems retained

Page 16: Usability testing:what have we overlooked

Results

No correlation found between number of test users and number of problems and new problems found

Correlation found between task coverage and number of problems and new problems found

Heterogeneity and representativeness of participants together with number of participants and number of user tasks seems to lead to higher problem detection

Page 17: Usability testing:what have we overlooked

Discussion

Though all teams had >5 participants, number of problems found ranged only between 7% and 43%. This doesn’t support the Magic Number 5 convention.

The formula 1-(1-p)#n assumes that problems are equally detectable and randomly discovered. Type of application, type of test user and test team experience vary widely and may effect the detectability of problems.

Participants who “get in character” may find more problems

Overall, careful attention to task design and user recruitment cannot entirely account for variation in problem discovery.

However, for optimum return on investment, it is wiser to invest more time designing more user tasks than to recruit more participants.

Page 18: Usability testing:what have we overlooked

Future research

User task types

Participant recruitment

Test user personas

Interactions between these variables

Page 19: Usability testing:what have we overlooked

Discussion questions: readability

Results given up front

Bad sentences: “…high enough to reveal most…problems, but not too high to keep the running costs manageable.”

?

Page 20: Usability testing:what have we overlooked

Class discussion: assumptions

HCI has paid too much attention to the problem of #participants over the last 15 years.

?

Page 21: Usability testing:what have we overlooked

Discussion questions

Credibility Applicability Generalizability Scalability

Page 22: Usability testing:what have we overlooked

The End