The Evolution of Evaluation

download The  Evolution of Evaluation

If you can't read please download the document

description

The Evolution of Evaluation. CHI 2007 alt.chi 30 April 2007. Joseph ‘Jofish’ Kaye Phoebe Sengers Cornell University, Ithaca NY jofish @ cornell.edu sengers @ cs.cornell.edu. What is evaluation?. Part of the practice of HCI Part of the design-build-evaluate iterative design cycle - PowerPoint PPT Presentation

Transcript of The Evolution of Evaluation

  • The EvolutionofEvaluationCHI 2007alt.chi

    30 April 2007Joseph Jofish KayePhoebe SengersCornell University, Ithaca NYjofish @ cornell.edusengers @ cs.cornell.edu

  • What is evaluation?Part of the practice of HCIPart of the design-build-evaluate iterative design cycleA comparison of built to plannedA place to reflect on both this and the next designAnd. A way of defining a fieldThe space where a discipline validates the knowledge it creates.

  • What is evaluation?Something you do at the end of a project to show it works so you can publish it.A reason papers get rejectedWhich, again, are other ways of saying:A way of defining a fieldThe space where a discipline validates the knowledge it creates.

  • HCI Evaluation: ValidityMethods for establishing validity vary depending on the nature of the contribution. They may involve empirical work in the laboratory or the field, the description of rationales for design decisions and approaches, applications of analytical techniques, or proof of concept system implementationsCHI 2007 Website

  • SoHow and why did we end up with the system(s) we use for HCI evaluation today?How can our current approaches to evaluation deal with novel concepts of HCI, such as third-wave/paradigm or experience-focused (rather than task focused) HCI?And in particular

  • The Virtual Intimate Object (VIO)A device for couples in long distance relationships to communicate intimacyWhen one partner clicks, the others circle lights up, and then fades over time.

    www.intimateobjects.orgKaye. I just clicked to say I love you. alt.chi, Ext. Abs. CHI 2006.

  • Evaluation of the VIOIts about the experience; its not about the taskHow can we measure intimacy and the transmission thereof?

    Kaye, Levitt, Nevins, Golden & Schmidt. Communicating Intimacy One Bit at a Time. Ext. Abs. CHI 2005.Kaye. I just clicked to say I love you. alt.chi, Ext. Abs. CHI 2006.

  • Understanding how we got to where we are todayEvaluation by EngineersEvaluation by Computer ScientistsEvaluation by Experimental Psychologists & Cognitive ScientistsEvaluation by HCI ProfessionalsEvaluation for Experience

  • (with case studies)Evaluation by EngineersEvaluation by Computer ScientistsEvaluation by Experimental Psychologists & Cognitive ScientistsEvaluation of Text EditorsEvaluation by HCI ProfessionalsDamaged MerchandiseEvaluation for Experience

  • Why does evaluation evolve? Evolution is adaptation to fit changing conditions. What changes?

    Who are the users?Who are the evaluators?What are the limiting factors?

    p.s. note historical chunking and simplification

  • Evaluation by EngineersUsers are engineers & mathematiciansEvaluators are engineersThe limiting factor is reliability

  • Evaluation by Computer ScientistsUsers are programmersEvaluators are programmersThe speed of the machine is the limiting factor

  • Evaluation by Computer ScientistsFirst uses ofHuman-computer interactionIt seems that when a system encourages close human-computer interaction, it also encourages close human-human and human-computer-human interaction (Schwartz 1965)Computer-human interactionPLANIT A Flexible Language Designed for Computer-Human Interaction (Feingold 1967)

  • Evaluation by Experimental Psychologists& Cognitive ScientistsUsers are users: the computer is a tool; often in offices.Evaluators are cognitive scientists and experimental psychologists: theyre used to measuring things through experimentThe limiting factor is what the human can do

  • Case Study of ExPsych / CogSci Evaluation: Text EditorsRoberts & Moran, 1982, 1983.Their methodology for evaluating text editors had three criteria:objectivitythoroughnessease-of-use

  • Case Study: Text Editorsobjectivity implies that the methodology not be biased in favor of any particular editors conceptual structurethoroughness implies that multiple aspects of editor use be considered ease-of-use (of the method, not the editor itself)the methodology should be usable by editor designers, managers of word processing centers, or other nonpsychologists who need this kind of evaluative information but who have limited time and equipment resources

  • Case Study: Text Editorsobjectivity implies that the methodology not be biased in favor of any particular editors conceptual structurethoroughness implies that multiple aspects of editor use be considered. ease-of-use (of the method (not the editor itself),the methodology should be usable by editor designers, managers of word processing centers, or other nonpsychologists who need this kind of evaluative information but who have limited time and equipment resources.

  • Case Study: Text Editors

    Text editors are the white rats of HCI

    Thomas Green, 1984,in Grudin, 1990.

  • Evaluation by HCI ProfessionalsThey believe in expertise over experiment (Nielsen 1984)Theyve made a decision to decide to focus on better results, regardless of whether they were experimentally provable or not.

  • Evaluation by HCI ProfessionalsEvaluators are usability professionals (often with Exp.Psych/CogSci backgrounds)Users are (often) white collar, using computers to accomplish their jobsThe limiting factor is the time of the worker accomplishing their job

  • Case Study: The Damaged Merchandise Debate

  • Damaged Merchandise SetupEarly eighties:usability evaluation methods (UEMs)- heuristics (Neilsen)- cognitive walkthrough- GOMS-

  • Damaged Merchandise Comparison StudiesJeffries, Miller, Wharton and Uyeda (1991)Karat, Campbell and Fiegel (1992)Nielsen (1992)Desuirve, Kondziela, and Atwood (1992)Nielsen and Phillips (1993)

  • Damaged Merchandise PanelWayne D. Gray, Panel at CHI95

    Discount or Disservice? Discount Usability Analysis at a Bargain Price or Simply Damaged Merchandise

  • Damaged Merchandise PaperWayne D. Gray & Marilyn SalzmanSpecial issue of HCI:Experimental Comparisons of Usability Evaluation Methods

  • Damaged Merchandise ResponseCommentary on Damaged MerchandiseKarat: experiment in contextJeffries & Miller: real-worldLund & McClelland: practicalJohn: case studiesMonk: broad questionsOviatt: field-wide scienceMacKay: triangulateNewman: simulation & modelling

  • Damaged Merchandise Clash of ParadigmsExperimental Psychologists & Cognitive Scientists (who believe in experimentation) vs.HCI Professionals (who believe in experience and expertise, even if unprovable) (and who were trying to present their work in the terms of the dominant paradigm of the field.)

    Kuhn (1972) Structure of Scientific Revolutions

  • Damaged Merchandise Clash of ParadigmsIn this particular work, were not talking about whos rightIts about recognizing what paradigm clashes look like in HCIIts about the need to present work in the terms of the dominant paradigm of the fieldIts thinking about how to recognize and re-think our own approaches to knowing and doing HCI: an HCI that recognizes how it knows what it knows

  • Experience Focused HCIA possibly emerging sub-field, drawing from traditions and disciplines outside the fieldEmphasis on the experience, not [just] the taskThinking about technology as more like a car than a text editorWright & McCarthy, Gaver, Blythe, Hk, Taylor & Swan, Bdker, Peterson, Isbister

  • Experience Focused HCIFor exampleHow can you evaluate a car?Why do you drive what you drive?Grad-student-chic? Eco-chic?sMachismo? Safety? Gay? Speed?For users, HCI is cultural as well as technologicalWell fail if we evaluate purely on task

  • Experience Focused HCIThe users are people choosing to use technology for the joy of it, & to do what they want in everyday life.The evaluators are us and ethnographers and designers and documentary filmmakers and writers and playwrightsThe limiting factor might be how to express oneself, how to be and be seen (or not)

  • Why the evolution of evaluation mattersNew paradigms require new ways of knowing and new ways of evaluationDifficulties come when one paradigm tries to present work in the manner of another paradigmWe need to actively recognize and call attention to when this happens, both as researchers and reviewers

  • An evolving discussion

    SIG: Evaluation of Experience-focused HCI Thursday, 9am, Room C4

    Joseph Jofish [email protected] (paper & talk at jofish.com)Phoebe [email protected] Research sponsored in part by the NSF and Microsoft Research CambridgeThanks to the Culturally Embedded Computing Group, BostonCHI, Alex Taylor, Ken Wood, Richard Harper, Abi Sellen, Shahram Izadi, Lorna Brown & the CMLG, Microsoft Cambridge, Apala Lahiri Chavan & Eric Schaffer, HFI, CHI Bangalore, CHI Mumbai, BostonCHI, the Cornell S&TS Department, Maria Hkansson & IT University Gteborg, Louise Barkhuus, Barry Brown & University of Glasgow, Mark Blythe & University of York, Andy Warr & the Oxford E-Research Center, Susanne Bdker, Marianne Graves Petersen & The University of Aarhus, Terry Winograd, Wendy Ju, Scott Klemmer & The Stanford HCI Seminar, Jonathan Grudin, Liam Bannon, Gilbert Cockton, William Newman, Kirsten Boehner, Jeff Hancock, Bill Gaver, Janet Vertesi, Kia Hk, Jarmo Laaksolahti, Anna Sthl, Helen Jeffries, Paul Dourish, Jen Rode, Peter Wright, Ryan Aipperspach, Bill Buxton, Michael Lynch, Seth Beemer McGinnis & Katherine Isbister,

    *************************Neither term has much sway before 1980, with less than tenreferences in each case in the ACM library prior to that date. Standard terms at that timewere Man-Computer Dialog (Marin 1973) or Man-Machine Communication, the titleof a textbook (Meadow 1970), and with an initial use in 1961 (Ross 1961). Saidtextbooks introduction includes an apology to feminists for the use of the word manbut bemoans the clunkiness of any conceivable alternative, such as Man and WomanMachine Communication, a limitation we seem to have somehow overcome. Grudinnotes in (Grudin 2006) CHIs deliberate push for gender-neutral terminology from thestart with their deliberate rejection of the terminology of man-machinecommunication.

    **********This tells us more about HCI than about text editors. But importantly, it exposes the default paradigm: this lets us see whats happening.**********************I drive a Grey 1997 Saab convertible. **I drive a Grey 1997 Saab convertible. **Heres one story about what Experience-focused HCI might be about.

    And its emergying**Everybody: Dangerous claim.***