BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations
-
Upload
beliv-workshop -
Category
Documents
-
view
220 -
download
0
description
Transcript of BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations
![Page 1: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations](https://reader033.fdocuments.net/reader033/viewer/2022061217/54b485464a7959d6398b459d/html5/thumbnails/1.jpg)
Conceptual and practical challenges in InfoViz evaluations
Kasper Hornbæk
![Page 2: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations](https://reader033.fdocuments.net/reader033/viewer/2022061217/54b485464a7959d6398b459d/html5/thumbnails/2.jpg)
Concerns in evaluation
• Increase in InfoViz studies that include evaluation
• But concerns over those studies remain, see– The BELIV workshops– Reviews of InfoViz papers (e.g., Munzner 2008)
• Concerns are shared by research in – Usability evaluation (e.g., Gray & Salzman,
1998)– CHI (e.g., Greenberg & Buxton 2008, PC
meeting)
![Page 3: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations](https://reader033.fdocuments.net/reader033/viewer/2022061217/54b485464a7959d6398b459d/html5/thumbnails/3.jpg)
Conceptual challenges
• Observations about (some) InfoViz studies:– Win-lose evaluations– Point designs– Lack of clarity in key constructs– Lack of theory-driven experiments
![Page 4: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations](https://reader033.fdocuments.net/reader033/viewer/2022061217/54b485464a7959d6398b459d/html5/thumbnails/4.jpg)
Example: The concept of overview
• An overview– “is constructed from, and represents, a
collection of objects of interest” (Greene et al., 2000, p. 381)
– “implies a qualitative awareness of one aspect of some data, preferably acquired rapidly and, even better, pre-attentively: that is, without cognitive effort” (Spence, 2007, p.19)
Hornbæk & Hertzum (under review)
![Page 5: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations](https://reader033.fdocuments.net/reader033/viewer/2022061217/54b485464a7959d6398b459d/html5/thumbnails/5.jpg)
The concept of overview, cont’
• 1391 occurrences of overview in 60 studies analyzed
• Used in three senses:– 81% technical (overview+detail
visualization)– 14% user-centered (user forming an
overview)– 5% other ( “give an overview of the
literature”)
• Does an overview give an overview?Hornbæk & Hertzum (under review)
![Page 6: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations](https://reader033.fdocuments.net/reader033/viewer/2022061217/54b485464a7959d6398b459d/html5/thumbnails/6.jpg)
Example: Fisheye interfaces
• DOI function composed of a priori interest and distance components
• Mixed empirical results (e.g., Cockburn et al. 2008)– some optimistic (e.g., papers at this
CHI), – some pessimistic (e.g., Lam & Munzner
2008)
![Page 7: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations](https://reader033.fdocuments.net/reader033/viewer/2022061217/54b485464a7959d6398b459d/html5/thumbnails/7.jpg)
Jakobsen & Hornbæk, “Evaluating a fisheye view of source code”, CHI (2006)
![Page 8: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations](https://reader033.fdocuments.net/reader033/viewer/2022061217/54b485464a7959d6398b459d/html5/thumbnails/8.jpg)
Context view
Focus area
Jakobsen & Hornbæk, “Fisheyes in the field …”, CHI (2009)
![Page 9: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations](https://reader033.fdocuments.net/reader033/viewer/2022061217/54b485464a7959d6398b459d/html5/thumbnails/9.jpg)
Fisheye interfaces, cont’
• Is a priori interest useful?• Our data suggests maybe
– Hard to understand a priori components of the DOI function
– Appears less useful in real-life programming than lines linked directly to the users’ focus
• But no one (?!) has generated direct evidence about this question
Jakobsen & Hornbæk (under review)
![Page 10: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations](https://reader033.fdocuments.net/reader033/viewer/2022061217/54b485464a7959d6398b459d/html5/thumbnails/10.jpg)
Strong inference
• Platt (1964) proposed the notion of strong inference, with the steps of– Devising alternative hypotheses– Devising a crucial experiment which will
rule out some hypotheses– Carry out the experiment
![Page 11: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations](https://reader033.fdocuments.net/reader033/viewer/2022061217/54b485464a7959d6398b459d/html5/thumbnails/11.jpg)
Radical solutions
• Newman (1994) reviewed CHI publications and found 25% radical solutions
![Page 12: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations](https://reader033.fdocuments.net/reader033/viewer/2022061217/54b485464a7959d6398b459d/html5/thumbnails/12.jpg)
Practical challenges
![Page 13: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations](https://reader033.fdocuments.net/reader033/viewer/2022061217/54b485464a7959d6398b459d/html5/thumbnails/13.jpg)
Simple outcome measures
Hornbæk, K. (2006), “Current Practice in Measuring Usability …”, IJHCS, 64 (2), 79-102
Binary task completion
Spatial accuracy
Error rates
Comprehension/learning
Recall
Completeness
Expert grading of product
0% 5% 10% 15% 20% 25% 30%
![Page 14: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations](https://reader033.fdocuments.net/reader033/viewer/2022061217/54b485464a7959d6398b459d/html5/thumbnails/14.jpg)
Simple process measures
• 57% of studies measure task completion time
• Task completion time work as a summary measure of the process
• How should we interpret task completion time?
Hornbæk, K. (2006), “Current Practice in Measuring Usability …”, IJHCS, 64 (2), 79-102
![Page 15: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations](https://reader033.fdocuments.net/reader033/viewer/2022061217/54b485464a7959d6398b459d/html5/thumbnails/15.jpg)
Example: Comparing text viz.
Hornbæk & Frøkjær, TOCHI, 2003
![Page 16: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations](https://reader033.fdocuments.net/reader033/viewer/2022061217/54b485464a7959d6398b459d/html5/thumbnails/16.jpg)
![Page 17: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations](https://reader033.fdocuments.net/reader033/viewer/2022061217/54b485464a7959d6398b459d/html5/thumbnails/17.jpg)
![Page 18: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations](https://reader033.fdocuments.net/reader033/viewer/2022061217/54b485464a7959d6398b459d/html5/thumbnails/18.jpg)
Longer and more frequent exploration
Overview+Detail (N=59) Linear (N=40)0
0.5
1
1.5
2
2.5
3
Time for further exploration (min.)
![Page 19: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations](https://reader033.fdocuments.net/reader033/viewer/2022061217/54b485464a7959d6398b459d/html5/thumbnails/19.jpg)
Standing on the shoulders of giants
• A key challenge is to build on the work of others in selecting:– Data– Tasks– Measures– Interfaces
• Crucial for replications, accumulating knowledge, validity/reliability
![Page 20: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations](https://reader033.fdocuments.net/reader033/viewer/2022061217/54b485464a7959d6398b459d/html5/thumbnails/20.jpg)
The system is …
I feel …
1 2 3 4 5
![Page 21: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations](https://reader033.fdocuments.net/reader033/viewer/2022061217/54b485464a7959d6398b459d/html5/thumbnails/21.jpg)
accessible, adequate, annoying, anxiety, appealing, boring, clear, cluttered, comfortable, competent, comprehensible, conclusive, confident, conflict, confusing, connected, convenient, desirable, difficult, dislikable, dissatisfied, distracting, easy, effective, efficient, embarrassed, emotional, engaging, enjoyable, entertaining, enthusiasm, excellent, exciting, familiar, favorable, flexible, flustered, friendly, frustrating, fun, good, hate, helpfulness, immediate, important, improving, inefficient, intelligent, interested, intuitive, involved, irritation, learnable, likable, lively, loved, motivating, natural, nice, personal, plain, pleasant, preference, presence, productive, quality, quick, relevant, reliable, respect, responsive, satisfied, sensate, sense of being together, sense of control, sense of success, simple, smooth, sociable, social presence, stimulating, successful, sufficient, surprising, time consuming, timely, tiring, trust, uncomfortable, understand, useful, user friendly, vexed, vivid, warm, well-organized
![Page 22: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations](https://reader033.fdocuments.net/reader033/viewer/2022061217/54b485464a7959d6398b459d/html5/thumbnails/22.jpg)
Reliability of questionnaires
• Homegrown questionnaires have low reliability, 6 are below .7
Questionnaire N Cronbach’s α
Mean Range
Standard questionnaires (e.g., TLX, QUIS, CSUQ)
16 .814 .73 - .95
Homegrown 20 .736 .21 - .92
Table 1: Reliability of satisfaction questionnaires as indicated by Cronbach’s α [6]. Homegrown refers to questionnaires that authors themselves developed to capture ease-of-use.
Hornbæk & Law, CHI (2007)
![Page 23: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations](https://reader033.fdocuments.net/reader033/viewer/2022061217/54b485464a7959d6398b459d/html5/thumbnails/23.jpg)
Example: Selecting tasks
• Tasks are crucial (thanks Catherine)• But chosen ad hoc, to match
evaluation, or habitually• Which task would you use to
evaluate an overview?– Monitoring rare, navigation frequent
(Hornbæk & Hertzum, under review)
![Page 24: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations](https://reader033.fdocuments.net/reader033/viewer/2022061217/54b485464a7959d6398b459d/html5/thumbnails/24.jpg)
What to do?
• More strong, theoretically motivated comparisons (TILCS?)
• More complex measures of outcome and process, coupled with richer data
• Build on existing work, replications• Task taxonomies, do task-level
analysis, do field work• Studies of adoption and integration
![Page 25: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations](https://reader033.fdocuments.net/reader033/viewer/2022061217/54b485464a7959d6398b459d/html5/thumbnails/25.jpg)
![Page 26: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations](https://reader033.fdocuments.net/reader033/viewer/2022061217/54b485464a7959d6398b459d/html5/thumbnails/26.jpg)
Construct validity
From Adcock, R. & Collier, D. (2001), Measurement Validity: A shared standard …, American Political Science Review, 95(3), 529-546
![Page 27: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations](https://reader033.fdocuments.net/reader033/viewer/2022061217/54b485464a7959d6398b459d/html5/thumbnails/27.jpg)
Results: are measures correlated?
• Effectiveness vs. efficiency: r = .247 ± .059
• May be interpreted as:– 6% variance explained– small (~.1) to medium (~.3) effect
(Cohen 1969)
0
0,1
0,2
0,3
0,4
0,5
0 1 2 3 4 5 6
Time
Err
ors
r = .229
0
0,5
1
1,5
2
2,5
0 2 4 6 8 10 12
Time
Err
ors
r = .23
![Page 28: BELIV'10 Keynote: Conceptual and Practical Challenges in InfoViz Evaluations](https://reader033.fdocuments.net/reader033/viewer/2022061217/54b485464a7959d6398b459d/html5/thumbnails/28.jpg)
Results: are measures correlated?
• Task complexity does not influence the correlations
• More complex measures (e.g., quality of outcome) attenuate correlations
• Difference between errors-along-the-way (.441) and task-completion-errors (.155)