Australian Computer Society · 2020. 5. 27. · Table of Contents Proceedings of the Thirteenth...

Conferences in Research and Practice in

Information Technology

Volume 126

User Interfaces 2012

Australian Computer Science Communications, Volume 34, Number 5

Client: Computing Research & Education Project: IdentityJob #: COR09100 Date: November 09

User Interfaces 2012

Proceedings of theThirteenth Australasian User Interface Conference(AUIC 2012), Melbourne, Australia,31 January – 3 February 2012

Haifeng Shen and Ross T. Smith, Eds.

Volume 126 in the Conferences in Research and Practice in Information Technology Series.Published by the Australian Computer Society Inc.

Published in association with the ACM Digital Library.

acmacm

iii

User Interfaces 2012. Proceedings of the Thirteenth Australasian User Interface Conference (AUIC2012),Melbourne, Australia, 31 January – 3 February 2012

Conferences in Research and Practice in Information Technology, Volume 126.

Copyright c©2012, Australian Computer Society. Reproduction for academic, not-for-profit purposes permittedprovided the copyright text at the foot of the first page of each paper is included.

Editors:

Haifeng ShenSchool of Computer Science, Engineering and MathematicsFlinders UniversityGPO Box 2100Adelaide, South Australia 5001AustraliaEmail: [email protected]

Ross T. SmithSchool of Computer and Information ScienceUniversity of South AustraliaGPO Box 2471Adelaide, South Australia 5001AustraliaEmail: [email protected]

Series Editors:Vladimir Estivill-Castro, Griffith University, QueenslandSimeon J. Simoff, University of Western Sydney, NSWEmail: [email protected]

Publisher: Australian Computer Society Inc.PO Box Q534, QVB Post OfficeSydney 1230New South WalesAustralia.

Conferences in Research and Practice in Information Technology, Volume 126.ISSN 1445-1336.ISBN 978-1-921770-07-4.

Printed, January 2012 by University of Western Sydney, on-line proceedingsPrinted, January 2012 by RMIT, electronic mediaDocument engineering by CRPIT

The Conferences in Research and Practice in Information Technology series disseminates the results of peer-reviewedresearch in all areas of Information Technology. Further details can be found at http://crpit.com/.

iv

Table of Contents

Proceedings of the Thirteenth Australasian User Interface Conference (AUIC2012),Melbourne, Australia, 31 January – 3 February 2012

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

Programme Committee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

Organising Committee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

Welcome from the Organising Committee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x

CORE - Computing Research & Education . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi

ACSW Conferences and the Australian Computer ScienceCommunications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii

ACSW and AUIC 2012 Sponsors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv

Contributed Papers

Website Navigation Tools - A Decade of Design Trends 2002 to 2011 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3Chris J. Pilgrim

Leveraging Human Movement in the Ultimate Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11Rohan McAdam, Keith Nesbitt

Website Accessibility: An Australian View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21Jonathon Grantham, Elizabeth Grantham, David Powers

Merging Tangible Buttons and Spatial Augmented Reality to Support Ubiquitous Prototype Designs 29Tim M. Simon, Ross T. Smith, Bruce Thomas, Stewart Von Itzstein, Mark Smith, Joonsuk Park,Jun Park

A Virtual Touchscreen with Depth Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39Gabriel Hartmann, Burkhard C. Wunsche

Evaluating Indigenous Design Features Using Cultural Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49Reece George, Keith Nesbitt, Michael Donovan, John Maynard

Enhancing 3D Applications Using Stereoscopic 3D and Motion Parallax . . . . . . . . . . . . . . . . . . . . . . . . . . 59Ivan K. Y. Li, Edward M. Peek, Burkhard C. Wunsche, Christof Lutteroth

An Evaluation of a Sketch-Based Model-by-Example Approach for Crowd Modelling . . . . . . . . . . . . . . 69Li Guan, Burkhard C. Wunsche

Supporting Freeform Modelling in Spatial Augmented Reality Environments with a New DeformableMaterial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

Ewald T. A. Mass, Michael R. Marner, Ross T. Smith, Bruce H. Thomas

Contributed Posters

Service History: The Challenges of the ’Back button’ in Mobile Context-aware Systems . . . . . . . . . . . . 89Annika Hinze, Knut Muller, George Buchanan

An Investigation of Factors Driving Virtual Communities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91Jonathon Grantham, Cullen Habel

Feasibility of Computational Estimation of Task-Oriented Visual Attention . . . . . . . . . . . . . . . . . . . . . . . 93Yorie Nakahira, Minoru Nakayama

Magnetic Substrate for Use with Tangible Spatial Augmented Reality in Rapid Prototyping Workflows 95Tim M. Simon, Ross T. Smith

Data Mining Office Behavioural Information from Simple Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97Samuel J. O’Malley, Ross T. Smith and Bruce H. Thomas

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

vi

Preface

Welcome to Melbourne and the 13th Australasian User Interface Conference, the forum for user interfaceresearchers and practitioners at the Australasian Computer Science Week 2012. AUIC provides an oppor-tunity for user interface researchers in the areas of HCI, CSCW, and pervasive computing to meet withcolleagues and with other computer scientists, and aims to strengthen the community of researchers inAustralasia.

The papers presented in these proceedings have been rigorously reviewed. Out of 19 submitted papers,9 papers were selected for presentations and 5 were selected for posters. The breadth and quality of thepapers reflect the dynamic and innovative Australasian research environment.

We offer our sincere thanks to the people who made this years conference possible: the authors andparticipants, the program committee members and reviewers, the ACSW organizers, and the AustralianComputer Society.

Haifeng ShenFlinders University

Ross T. SmithUniversity of South Australia

AUIC 2012 Programme ChairsJanuary 2012

vii

Programme Committee

Chairs

Haifeng Shen, Flinders University, AustraliaRoss T. Smith, University of South Australia, Australia

Web Chair

James Walsh, University of South Australia, Australia

Members

Mark Apperley, University of Waikato, New ZealandRachel Blagojevic, University of Auckland, New ZealandPaul Calder, Flinders University, Australia, AustraliaDavid Chen, Griffith University, AustraliaSally Jo Cunningham, University of Waikato, New ZealandJohn Grundy, Swinburne Univeristy of Technology, AustraliaStewart Von Itzstein, University of South Australia, AustraliaChristof Lutteroth, University of Auckland, New ZealandStuart Marshall, Victoria University of Wellington, New ZealandMasood Masoodian, University of Waikato, New ZealandChristian Muller?Tomfelde, CSIRO, AustraliaAaron Toney, Nokia Beryl Plimmer, University of Auckland, New ZealandGerald Weber, University of Auckland, New ZealandBurkhard Wunsche, University of Auckland, New ZealandJoanne Zucco, University of South Australia, Australia

Additional Reviewers

Jingzhi Guo, University of Macau, MacauBrett Wilkinson, Flinders University, AustraliaTim Simon, University of South Australia, Australia

viii

Organising Committee

Members

Dr. Daryl D’SouzaAssoc. Prof. James Harland (Chair)Dr. Falk ScholerDr. John ThangarajahAssoc. Prof. James ThomDr. Jenny Zhang

ix

Welcome from the Organising Committee

On behalf of the Australasian Computer Science Week 2012 (ACSW2012) Organising Committee, wewelcome you to this year’s event hosted by RMIT University. RMIT is a global university of technologyand design and Australia’s largest tertiary institution. The University enjoys an international reputationfor excellence in practical education and outcome-oriented research. RMIT is a leader in technology, design,global business, communication, global communities, health solutions and urban sustainable futures. RMITwas ranked in the top 100 universities in the world for engineering and technology in the 2011 QS WorldUniversity Rankings. RMIT has three campuses in Melbourne, Australia, and two in Vietnam, and offersprograms through partners in Singapore, Hong Kong, mainland China, Malaysia, India and Europe. TheUniversity’s student population of 74,000 includes 30,000 international students, of whom more than 17,000are taught offshore (almost 6,000 at RMIT Vietnam).

We welcome delegates from a number of different countries, including Australia, New Zealand, Austria,Canada, China, the Czech Republic, Denmark, Germany, Hong Kong, Japan, Luxembourg, Malaysia, SouthKorea, Sweden, the United Arab Emirates, the United Kingdom, and the United States of America.

We hope you will enjoy ACSW2012, and also to experience the city of Melbourne.,Melbourne is amongst the world’s most liveable cities for its safe and multicultural environment as

well as well-developed infrastructure. Melbournes skyline is a mix of cutting-edge designs and heritagearchitecture. The city is famous for its restaurants, fashion boutiques, cafe-filled laneways, bars, art galleries,and parks.

RMIT’s city campus, the venue of ACSW2012, is right in the heart of the Melbourne CBD, and can beeasily accessed by train or tram.

ACSW2012 consists of the following conferences:

– Australasian Computer Science Conference (ACSC) (Chaired by Mark Reynolds and Bruce Thomas)– Australasian Database Conference (ADC) (Chaired by Rui Zhang and Yanchun Zhang)– Australasian Computer Education Conference (ACE) (Chaired by Michael de Raadt and Angela Car-

bone)– Australasian Information Security Conference (AISC) (Chaired by Josef Pieprzyk and Clark Thom-

borson)– Australasian User Interface Conference (AUIC) (Chaired by Haifeng Shen and Ross Smith)– Computing: Australasian Theory Symposium (CATS) (Chaired by Julian Mestre)– Australasian Symposium on Parallel and Distributed Computing (AusPDC) (Chaired by Jinjun Chen

and Rajiv Ranjan)– Australasian Workshop on Health Informatics and Knowledge Management (HIKM) (Chaired by Ker-

ryn Butler-Henderson and Kathleen Gray)– Asia-Pacific Conference on Conceptual Modelling (APCCM) (Chaired by Aditya Ghose and Flavio

Ferrarotti)– Australasian Computing Doctoral Consortium (ACDC) (Chaired by Falk Scholer and Helen Ashman)

ACSW is an event that requires a great deal of co-operation from a number of people, and this year hasbeen no exception. We thank all who have worked for the success of ACSE 2012, including the OrganisingCommittee, the Conference Chairs and Programme Committees, the RMIT School of Computer Scienceand IT, the RMIT Events Office, our sponsors, our keynote and invited speakers, and the attendees.

Special thanks go to Alex Potanin, the CORE Conference Coordinator, for his extensive expertise,knowledge and encouragement, and to organisers of previous ACSW meetings, who have provided us witha great deal of information and advice. We hope that ACSW2012 will be as successful as its predecessors.

Assoc. Prof. James HarlandSchool of Computer Science and Information Technology, RMIT University

ACSW2012 ChairJanuary, 2012

CORE - Computing Research & Education

CORE welcomes all delegates to ACSW2012 in Melbourne. CORE, the peak body representing academiccomputer science in Australia and New Zealand, is responsible for the annual ACSW series of meetings,which are a unique opportunity for our community to network and to discuss research and topics of mutualinterest. The original component conferences - ACSC, ADC, and CATS, which formed the basis of ACSWin the mid 1990s - now share this week with seven other events - ACE, AISC, AUIC, AusPDC, HIKM,ACDC, and APCCM, which build on the diversity of the Australasian computing community.

In 2012, we have again chosen to feature a small number of keynote speakers from across the discipline:Michael Kolling (ACE), Timo Ropinski (ACSC), and Manish Parashar (AusPDC). I thank them for theircontributions to ACSW2012. I also thank invited speakers in some of the individual conferences, and thetwo CORE award winners Warwish Irwin (CORE Teaching Award) and Daniel Frampton (CORE PhDAward). The efforts of the conference chairs and their program committees have led to strong programs inall the conferences, thanks very much for all your efforts. Thanks are particularly due to James Harlandand his colleagues for organising what promises to be a strong event.

The past year has been very turbulent for our disciplines. We tried to convince the ARC that refereedconference publications should be included in ERA2012 in evaluations – it was partially successful. Weran a small pilot which demonstrated that conference citations behave similarly to but not exactly thesame as journal citations - so the latter can not be scaled to estimate the former. So they moved allof Field of Research Code 08 “Information and Computing Sciences” to peer review for ERA2012. Theeffect of this will be that most Universities will be evaluated at least at the two digit 08 level, as refereedconference papers count towards the 50 threshold for evaluation. CORE’s position is to return 08 to acitation measured discipline as soon as possible.

ACSW will feature a joint CORE and ACDICT discussion on Research Challenges in ICT, which I hopewill identify a national research agenda as well as priority application areas to which our disciplines cancontribute, and perhaps opportunity to find international multi-disciplinary successes which could work inour region.

Beyond research issues, in 2012 CORE will also need to focus on education issues, including in Schools.The likelihood that the future will have less computers is small, yet where are the numbers of students weneed?

CORE’s existence is due to the support of the member departments in Australia and New Zealand,and I thank them for their ongoing contributions, in commitment and in financial support. Finally, I amgrateful to all those who gave their time to CORE in 2011; in particular, I thank Alex Potanin, Alan Fekete,Aditya Ghose, Justin Zobel, and those of you who contribute to the discussions on the CORE mailing lists.There are three main lists: csprofs, cshods and members. You are all eligible for the members list if yourdepartment is a member. Please do sign up via http://lists.core.edu.au/mailman/listinfo - we try to keepthe volume low but relevance high in the mailing lists.

Tom Gedeon

President, COREJanuary, 2012

ACSW Conferences and theAustralian Computer Science Communications

The Australasian Computer Science Week of conferences has been running in some form continuouslysince 1978. This makes it one of the longest running conferences in computer science. The proceedings ofthe week have been published as the Australian Computer Science Communications since 1979 (with the1978 proceedings often referred to as Volume 0 ). Thus the sequence number of the Australasian ComputerScience Conference is always one greater than the volume of the Communications. Below is a list of theconferences, their locations and hosts.

2013. Volume 35. Host and Venue - University of South Australia, Adelaide, SA.

2012. Volume 34. Host and Venue - RMIT University, Melbourne, VIC.

2011. Volume 33. Host and Venue - Curtin University of Technology, Perth, WA.2010. Volume 32. Host and Venue - Queensland University of Technology, Brisbane, QLD.2009. Volume 31. Host and Venue - Victoria University, Wellington, New Zealand.2008. Volume 30. Host and Venue - University of Wollongong, NSW.2007. Volume 29. Host and Venue - University of Ballarat, VIC. First running of HDKM.2006. Volume 28. Host and Venue - University of Tasmania, TAS.2005. Volume 27. Host - University of Newcastle, NSW. APBC held separately from 2005.2004. Volume 26. Host and Venue - University of Otago, Dunedin, New Zealand. First running of APCCM.2003. Volume 25. Hosts - Flinders University, University of Adelaide and University of South Australia. Venue

- Adelaide Convention Centre, Adelaide, SA. First running of APBC. Incorporation of ACE. ACSAC heldseparately from 2003.

2002. Volume 24. Host and Venue - Monash University, Melbourne, VIC.2001. Volume 23. Hosts - Bond University and Griffith University (Gold Coast). Venue - Gold Coast, QLD.2000. Volume 22. Hosts - Australian National University and University of Canberra. Venue - ANU, Canberra,

ACT. First running of AUIC.1999. Volume 21. Host and Venue - University of Auckland, New Zealand.1998. Volume 20. Hosts - University of Western Australia, Murdoch University, Edith Cowan University and

Curtin University. Venue - Perth, WA.1997. Volume 19. Hosts - Macquarie University and University of Technology, Sydney. Venue - Sydney, NSW.

ADC held with DASFAA (rather than ACSW) in 1997.1996. Volume 18. Host - University of Melbourne and RMIT University. Venue - Melbourne, Australia. CATS

joins ACSW.1995. Volume 17. Hosts - Flinders University, University of Adelaide and University of South Australia. Venue -

Glenelg, SA.1994. Volume 16. Host and Venue - University of Canterbury, Christchurch, New Zealand. CATS run for the first

time separately in Sydney.1993. Volume 15. Hosts - Griffith University and Queensland University of Technology. Venue - Nathan, QLD.1992. Volume 14. Host and Venue - University of Tasmania, TAS. (ADC held separately at La Trobe University).1991. Volume 13. Host and Venue - University of New South Wales, NSW.1990. Volume 12. Host and Venue - Monash University, Melbourne, VIC. Joined by Database and Information

Systems Conference which in 1992 became ADC (which stayed with ACSW) and ACIS (which now operatesindependently).

1989. Volume 11. Host and Venue - University of Wollongong, NSW.1988. Volume 10. Host and Venue - University of Queensland, QLD.1987. Volume 9. Host and Venue - Deakin University, VIC.1986. Volume 8. Host and Venue - Australian National University, Canberra, ACT.1985. Volume 7. Hosts - University of Melbourne and Monash University. Venue - Melbourne, VIC.1984. Volume 6. Host and Venue - University of Adelaide, SA.1983. Volume 5. Host and Venue - University of Sydney, NSW.1982. Volume 4. Host and Venue - University of Western Australia, WA.1981. Volume 3. Host and Venue - University of Queensland, QLD.1980. Volume 2. Host and Venue - Australian National University, Canberra, ACT.1979. Volume 1. Host and Venue - University of Tasmania, TAS.1978. Volume 0. Host and Venue - University of New South Wales, NSW.

Conference Acronyms

ACDC Australasian Computing Doctoral ConsortiumACE Australasian Computer Education ConferenceACSC Australasian Computer Science ConferenceACSW Australasian Computer Science WeekADC Australasian Database ConferenceAISC Australasian Information Security ConferenceAUIC Australasian User Interface ConferenceAPCCM Asia-Pacific Conference on Conceptual ModellingAusPDC Australasian Symposium on Parallel and Distributed Computing (replaces AusGrid)CATS Computing: Australasian Theory SymposiumHIKM Australasian Workshop on Health Informatics and Knowledge Management

Note that various name changes have occurred, which have been indicated in the Conference Acronyms sections

in respective CRPIT volumes.

xiii

ACSW and AUIC 2012 Sponsors

We wish to thank the following sponsors for their contribution towards this conference.

Client: Computing Research & Education Project: IdentityJob #: COR09100 Date: November 09

CORE - Computing Research and Education,www.core.edu.au

RMIT University,www.rmit.edu.au/

Australian Computer Society,www.acs.org.au

University of South Australiawww.unisa.edu.au

xiv

Contributed Papers

Proceedings of the Thirteenth Australasian User Interface Conference (AUIC2012), Melbourne, Australia

1

CRPIT Volume 126 - User Interfaces 2012

2

Website Navigation Tools – A Decade of Design Trends 2002 to 2011

Chris J Pilgrim Centre for Computing and Engineering Software Systems, Faculty of ICT

Swinburne University of Technology

PO Box 218, Hawthorn, 3122, Victoria

[email protected]

Abstract The World Wide Web Consortium describes the Web as

“the universe of network-accessible information, the

embodiment of human knowledge” (W3C, 2011). This

vision of the Web is contingent on the ability of users to

freely access and contribute to the overall system. The

freedom of the Web threatens its own future due to the

possibility of users being disoriented and cognitively

fatigued when trying to locate desired information.

Appropriate support for navigation is required if the Web

is to achieve its vision.

One challenge confronting website designers is to

provide effective navigational support at the local level.

Supplemental navigation tools such as search, sitemap

and index tools are frequently included on websites to

support navigation. However, there is a lack of detailed

guidelines for design of such tools. Instead changes in

design appear to be by natural evolution with a ‘survival

of the fittest’ approach.

This paper reports on a longitudinal survey of design

of website navigation tools within commercial websites

over the past decade. The survey exposes several trends

in design practice, particularly in recent years. The

intention of this survey is to provide a sounder basis for

future research and development of website navigation

tools by clarifying existing research and identifying

important issues for future investigation. .

Keywords: Website design, navigation tools, search,

sitemaps, indexes.

1 Introduction Web navigation is a two-stage process involving initially

finding a website that relates to an area of interest, and

then secondly, locating the information within the

individual website. The initial stage of navigation

generally uses global search tools (Nielsen 2000) that

provide users with a list of candidate websites. The

second stage of navigation involves users navigating

through individual websites using a combination of both

local search tools and page-to-page browsing (Katz and

Byrne, 2003).

The navigation tools that are available to the user at

the local level include the functions that are provided by

Copyright © 2012, Australian Computer Society, Inc. This

paper appeared at the 13th Australasian User Interface

Conference (AUIC 2012), Melbourne, Australia, January-

February 2012. Conferences in Research and Practice in

Information Technology (CRPIT), Vol. 126. H. Shen and R.

Smith, Eds. Reproduction for academic, not-for profit purposes

permitted provided this text is included.

the browser software and those that are incorporated into

the website by the developer of the site.

Web browsers generally only include limited

navigation tools such as back and forward buttons,

history lists, bookmarks, colour coding indicating

visited/unvisited links, the home button and the URL

field. These methods present navigational choices to the

user, utilising the self as the frame of reference. This

‘inside-out’ view of the information space is a result of

the Web being a ‘page-oriented’ hypertext-based system.

Browsers typically provide no feedback about the context

of the currently displayed page within the total

information space, nor do they provide any alternative

views of the site being visited. Users, when lost, will

attempt to find their way back to a previously visited

page, resulting in inappropriate use of the Back button

(Cockburn et al., 2003) and reluctance to explore further

(Ayers & Stasko, 1995). Browser software does not

provide the facilities to visualise the inter-relationships

between pages, preventing users from answering

questions such as ‘Where am I?’, ‘Where can I go from

here?’ or ‘Which pages point to this page?’ (Bieber et al.,

1997). This lack of knowledge of the overall structure of

the site can result in confusion and cognitive overload

when users jump from one location to another in the Web

(Mukherjea and Foley, 1995), or encounter multiple paths

to the same or different endpoints (Hedberg and Harper,

1992). The lack of location information can result in a

condition that Jul and Furnas (1998) describe as “desert

fog”, where a navigator is in a situation where the

immediate environment is totally devoid of navigational

clues that might be useful to the traveller.

Website navigation tools are included in websites by

developers to assist users in achieving orientation and

moving in a website towards a desired target. The three

most common website navigation tools are search tools,

sitemaps and indexes.

Website search tools allow users to search the current

site for those pages that match to a desired search string.

These tools generally provide users with a ranked list of

page that match the search criteria.

Sitemaps are a visual representation of the architecture

of a website providing users with either an overview of

the major headings of the content or a view of the

physical structure of the site. Sitemaps may be

considered similar to the table-of-contents of a book by

providing a list of the major categories of information

(i.e. chapters) and their subsections. Sitemaps improve

spatial context, reduce disorientation and support users

when they are attempting to initially orient themselves in

a website (Shneiderman, 1997).


3

Whilst sitemaps may be considered similar to a table-

of-contents provided at the front of a book, it is may be

presumed that an index of a website would be presented

as an alphabetical list of the contents of the site.

Usability problems relating to the lack of a global

navigation structure and inadequate locational feedback

from browsers are compounded by the desire for

flexibility of access and control and the vast size of the

Web. As a consequence of these factors, users are prone

to suffer from disorientation and cognitive overhead

whilst navigating the Web.

Disorientation within websites is a problem that may

never be solved but it may be alleviated through the

provision of aids and tools that minimise the cognitive

load of the task of navigating. Interfaces and tools that

support the navigation through websites need to be

designed with due consideration to the nature of the

navigational problems, and supported with a strong

theoretical and empirical background. It is only through a

considered design process that appropriate navigation

aids will be developed which are sensitive to the context

of the site, reducing cognitive overhead and disorientation

in users (Ahuja and Webster, 2001). This paper reviews

design guidelines for website navigation tools and then

reports on a survey of design practices over the past

decade in order identify emerging trends and patterns.

The identification of any trends in design practice will

provide sounder basis for future research and

development to improve website navigation tools.

2 Website Navigation Tool Design Guidelines Design guidelines provide a framework that guide

designers towards making sound decisions (Preece et al.,

1994) and hence are essential to designers and developers

who under the pressure of budgets and timelines cannot

afford to empirically test every design feature that they

implement. Design guidelines are particularly important

in the development of websites since the nature of the

Web means that it can be difficult to access a target user

group for usability tests.

Since the inception of the Web there has evolved a

range of website navigation tools with a variety of visual

properties and functional abilities. Xu et al. (2001)

reports that “although there are many visualisation and

web navigation tools, design guidelines for such

navigation visualisation systems are rarely reported”.

There are two kinds of guidelines: high-level guiding

principles and low-level detailed rules. A common

criticism of user interface design guidelines is that the

advice that is provided is either too general so that it is

difficult to apply to a specific case, or too specific and

cannot be widely applied (Beier and Vaughan, 2003).

Current web design guidelines appear to either lack any

reference to or only provide limited high-level advice

regarding the design of navigation tools such as search,

sitemaps and indexes.

For example, the “Web Style Guide” (Lynch and

Horton, 2009) is a well known set of Web design

guidelines. The third edition of these guidelines has some

advice regarding the design of site search tools however

the guidelines do not appear to mention sitemaps or

indexes at all. The previous second edition of these

guidelines did provided some limited advice regarding

the design of table-of-contents pages and sitemaps tools

however these sections have been removed in the most

recent edition.

The “Web Design and Usability Guidelines” (HHS,

2006) also provide reasonable advice regarding the

design of search tools however the advice regarding

sitemaps is limited to the following: “Use site maps for

Websites that have many pages. Site maps provide an

overview of the Website. They may display the hierarchy

of the Website, may be designed to resemble a traditional

table of contents, or may be a simple index.”

The UsabilityNet (UsabilityNet, 2006) guidelines

contain little more than the following statements

regarding search and sitemap tools: “On larger sites

consider providing a search facility - many users

habitually use search rather than exploring a site” and

“Provide a sitemap or overview - this helps users

understand the scope of the site.”

The Australian Government Information Management

Office (AGIMO, 2011) provide a range of “Better

Practice Checklists” to inform Web design practice for

Australian Government websites. The checklist for

Website Navigation includes the following advice

regarding provision of options for finding information:

“Because users approach information on a website

differently, agencies should provide users with a variety

of ways to get to information. Examples include:

embedded links, a sitemap giving an overall view of the

site, A-Z indexes and a search facility” (AGIMO, 2011).

The AGIMO site also contains a description of the most

common navigation tool types, including “supplemental

navigation which comprises additional navigation tools

such as sitemaps, indexes and guides.” Apart from this

general advice, there are no specific guidelines or advice

regarding the design or development of each type of tool.

This lack of specific advice regarding the design of

website navigation tools has left design practice open to

evolutionary change possibly with a survival of the fittest

approach. This intention of the longitudinal survey of

website navigation tools is to provide a comprehensive

overview of the current design practices of website

navigation tools.

3 Survey Methodology A longitudinal survey has been conducted to examine the

trends in the design and implementation of website

navigation tools in the websites of large commercial

companies over the past decade. An initial survey was

conducted in 2002 extending an approach utilised by

Russell (2002). The survey was repeated in October 2006

and again most recently in July 2011. The survey

methodology examined the websites of the top 300

companies in the Fortune top companies list. The 2002

survey reported on 299 websites as one site was not

available during the survey period. The 2006 survey

examined the exact same sites as those surveyed in 2002

and reported on 297 websites with three sites having

closed down since 2002. The most recent 2011 survey

again examined the same set of websites and reported on

266 websites. In the 2011 survey there were 31 websites

that that had either closed down or had been taken over

by a different company since the 2006 survey. It is

assumed that the recent global financial crisis may have


4

been responsible for many of these closures or take-overs

as many of the sites that had become unavailable related

to financial institutions.

The survey method used a taxonomy checklist

reported on previously that systematically evaluated the

presence and general design features of each type of

navigation tool (search, sitemap, index). The results of

the surveys are presented in Tables 1, 2 and 3.

4 Results

4.1 Search Tools Table 1 shows a steady increase in the provision of site

search tools into the surveyed websites over the past

decade. In the 2011 study there were 83.8% of the sites

surveyed that provided a search tool. Between 2006 and

2011 there were 41 companies that added a search tool to

their website and 11 companies which removed a search

tool indicating some decision making regarding the value

of a search tool. One significant change that has occurred

over the survey period is the method of providing a

search tool. In 2002, 32% of the sites provided a ‘Search’

link which had to be clicked in order to display a page

containing a text entry box. In 2011 only 8.1% of the

sites provided a link with the overwhelming majority

providing a text entry box as part of the general website

template avoiding the need for users to open a dedicated

‘search’ page.

Table 1 Search Tools

4.2 Sitemap Tools The survey results for 2011 as shown in Table 2 show a

considerable increase in the number of websites in the

sample group that provided a sitemap tool (65.6%). The

number of websites in the sample group providing a

sitemap had previously been stable with survey showing

51.2% in 2006 and 52.8% in 2002 and Russell (2002)

reporting 54% in 1999. It was noted that between 2006

and 2011 there were 65 companies that added a sitemap

tool to their website and 38 companies which removed

their sitemap indicating some decision making regarding

the value of a sitemap tool.

All sitemaps in the 2011 survey were found to use a

categorical approach to organising the various entries in

the sitemap. There was only one website that provided an

option to change the categorical display into an alphabetic

list of topics. The general structure of all websites in the

2011 survey was hierarchical with no websites using

network-based structures. One website did implement a

graphical approach to displaying levels in the hierarchy

(General Design Type D) with lines providing a visual

connection between the various levels. There has been a

decline in the use of graphical/network based formats

with Russell (2002) reported that 11% of sitemaps that

were surveyed in 1999 displayed a graphical depiction of

the site.

There now appears to be more of an even divide

between those sitemaps that visually distinguish the

levels in the hierarchy through the use of indenting

(51.7% General Design Type A) compared with those

that use a table-of-contents style to set up hierarchical

levels (47.7% General Design Type B).

Table 2 Sitemaps

The results suggest that the sitemaps in 2011 have

become more complex and crowded. The number of

sitemaps that can be viewed on a standard resolution


5

screen (1024x768) has increased to 32% up from 13.2%

in 2006 whilst the number of levels in the hierarchy has

remained approximately the same. This change may be

perceived to be beneficial to the users since a requirement

to scrolling to view a single view can cause the user to

perform sub-optimally (Beard and Walker, 1990).

One of the most interesting trends that has been

observed over the period of the survey is the increased

use of interactive controls in sitemaps. Interactive

controls generally provide the ability for the user to

expand and contract sections of the sitemap in order to

control the extent of detail within the current view. In

2002 there were only two websites that provided

interactive controls. This increased to 5 websites in 2006

and a total of 10 websites in 2011.

An additional trend that has developed in the most

recent 2011 survey is the inclusion of a sitemap style

navigation bar located on the bottom of the general

website template. This display is generally available on

every page within the website. In 2011 there were 25

websites (9.4%) in the sample which had adopted this

practice.

Table 3 Site Indexes

4.3 Index Tools The provision of a link entitled ‘Site Index’ or ‘Index’ on

the surveyed websites has reduced substantially over the

survey period. In 2002 a total of 22 websites provided a

site index tool reducing to 17 websites in 2006 and finally

only 6 websites in 2011. Between 2006 and 2011 there

were 15 companies that removed an index tool from their

website whilst only one company added an index tool. In

the previous surveys some of the site index tools actually

presented the index as an alphabetical list of the site

contents. In 2011 all of the available index tools

presented a categorical list which was structured similar

to a standard sitemap (General Design Type D). No site

provided links to both a sitemap and a site index.

5 Discussion The finding that there has been an increase in the

inclusion of site search tools into major commercial

websites over the past decade is not surprising. There has

been strong advice in various design guidelines which

recommend that site search tools be available on every

page within a website. It is also reported that users have

a strong preference for a text entry box rather than a link

to search (Nielsen, 2001).

The survey found that not only had the use of index

tools reduced substantially over the survey period, but all

remaining index pages use a categorical structure rather

than the expected alphabetical list of contents. The

decline in the use of index tools appears to have been

countered by an increase in sitemap tools.

The surveys have established that there has been a

surge in the past five years in the number of websites in

the sample group that now provide a sitemap tool. There

also appears to be more consistency in the general design

of sitemap tools with the vast majority of sitemaps being

organised as a hierarchical list of the categories of content

in the website either using indenting or columns to

identify the sections and/or levels in the hierarchy.

The surveys have highlighted several trends in the

design of sitemaps in major commercial websites.

5.1 Trend One: Textual Formats for Sitemaps The first trend relates to the adoption of textual forms of

sitemaps with a rejection of graphical structures.

Early sitemaps inherited their design influences from

navigational tools developed for pre-web hypertext

systems. The non-linearity of hypertext systems resulted

in some new usability problems particularly in relation to

disorientation and cognitive overhead (Conklin, 1987).

Several novel navigational aids were developed to

overcome the ‘Lost in Hyperspace’ challenges of

hypertext structures. One innovation was the

development of the 'Overview Diagram’ which provided

a graphical representation of the system topology.

Conklin (1987) claimed that overviews provided

“important measures of contextual and spatial cues to

supplement the user’s model of the nodes he is viewing,

and how they are related to each other and their

neighbours in the graph”. Cockburn and Jones (1997)

suggest that disorientation is alleviated through the

provision of graphical overviews as they not only help

users maintain a sense of context within an information

space, but also reduce cognitive overhead by providing an

external representation of the user’s memory of their

navigation session.

The design of sitemaps in the early years of the World

Wide Web adopted the graphical formats found in

previous hypertext systems. Rosenfeld and Moville

(1998) defined sitemaps as a “graphical representation of

the architecture of a website” and maintained that a

sitemap should provide a view of the site in a way that


6

goes beyond textual representation. One of the most

classical examples of an early graphical sitemap is the

Apple sitemap in the mid 1990s (Figure 1) which has

been replaced by various textual versions over the past

decade.

Figure 1: Graphical Sitemap (mid 1990s)

Whilst graphical or metaphorical styles of sitemap

designs may have aesthetic appeal, there is the risk that

users will find these designs difficult to immediately

comprehend if they are overly large or complex (Bieber

et al., 1997).

The survey has established that there is a clear trend

away from previous graphical designs towards textual

formats. Whilst there has been some pre-Web research

into textual versus graphical formats for hypertext

overview maps McDonald and Stevenson (1997), the

current trend in the design of website sitemaps appears to

lack a research basis and may simply be a result of

natural evolution with a ‘survival of the fittest’ approach.

One explanation is that textual formats provide users with

a familiar ‘table-of-contents’ structure from our

experience with books (Hoffman, 1996).

5.2 Trend Two: Interactive Controls A second emerging trend relates to the moderate increase

in the number of websites in the survey group that now

incorporate a sitemap with some interactive controls.

Maps of physical space do not attempt to display every

feature of the area being mapped true to scale, as this

would result in maps that are impossible to read

(Davidson, 2003). Hence, mapping is a process of the

application of symbols and abstractions in order to

control the complexity of the view presented to the user.

Mapping virtual spaces such as websites draws on this

experience, and visualisation techniques are commonly

applied in order to provide an integrated view of the

context and detail in a single view.

Designers of sitemaps must decide on the level of

detail to be provided, with a trade-off between providing

a complete view of the entire site contents with the risk

that users will get lost in the detail, or providing a narrow

view which may limit the opportunity for users to gain

detailed information (Danielson, 2002).

Visualisation techniques may be used to control the

complexity of the view presented to the user but still

allowing exploration of lower levels. There are various

techniques that may be applied including global and local

views, zooming controls and fish-eye views to provide

varying levels of detail.

The balance between presenting local detail and global

structure in maps of information spaces has been a major

theme in visualisation research. Hornbæk and Frøkjær

(2001), in an experiment comparing three types of

interfaces, found that an ‘overview+detail’ interface

supported navigation and helped users to gain an

overview of the structure of the document space.

Shneiderman (1997) proposed a Visual Information

Seeking strategy which involved three steps: overview

first, zoom and filter, then details-on-demand. Sifer and

Liechti (1999) stated that context can be maintained by

providing a distortion or ‘focus plus context’ view. an In

an empirical study Pirelli et al. (2001) found that an

integrated focus-plus-context view of an information

space increased search speeds claiming that the overview

provided cues that improved the probability that users

would search in the right part of the space.

Figure 2: Interactive Sitemap Using

Expand/Contract Controls (from 2011 survey)

The 2011 survey found two general approaches to

providing interactive controls to allow the user to manage

the extent of the detail within the sitemap. Four sites

implemented an approach similar to that in Figure 2

which allowed the user to expand or contract sections of

the sitemap.

Figure 3: Interactive Sitemap Using Filter Links

(from 2011 survey)

Six websites in the 2011 survey contained sitemaps

such as that shown in Figure 3 that included several

section headings at the top of the sitemap that could be

selected by the user to control the current sitemap view

effectively acting as a filter.

5.3 Trend 3: A Sitemap on Every Page The final trend relates to the increasing number of

websites in the sample group that have implemented a

general page template that includes a sitemap styled tool

at the bottom of every page on the website. For example,

the tabular display at the bottom of the website in Figure


7

4 provides users with a hierarchical view of the major

categories of content on the website. This display is

available on every page within the website effectively

providing users with a constantly visible sitemap.

Danielson (2002) investigated the effects on user

behaviour of having a constantly visible sitemap

implemented as a text-based contents list in a separate

frame in the window. An analysis of click-stream

behaviour of subjects, including number of pages visited,

revisits, back actions and distal jumps, found that the

availability of a constantly visible sitemap resulted in

users abandoning fewer information seeking tasks, less

use of the browser’s Back button, and frequent

navigational movements across the site hierarchy.

Figure 4: Constantly Visible Sitemaps

(from 2011 survey)

Yip (2004) examined five different sitemap conditions

which varied on constancy of visibility, incorporation of

hyperlinks and a no-sitemap condition. Measures of task

success, completion times and numbers of nodes visited

provided results that suggested that constantly visible

sitemaps increased performance especially for large

websites.

6 Conclusion The results of the longitudinal survey expose several

trends in design practices for website navigation tools

over the past decade. Emerging trends include the

increasing use of textual formats and interactivity in

sitemaps and the provision of sitemaps on every page of

the website. Such trends don’t appear to be supported by

published empirical research or design guidelines but

rather may be examples of evolutionary design by

‘natural selection’.

This paper provides website developers with an

understanding of the critical design factors and recent

trends regarding the design of website navigational tools.

Further research is required to examine whether there is

an empirical justification for the recent design trends in

order to provide developers with more confidence in their

selection of website usability guidelines.

7 References

AGIMO (2011), Website Navigation - Better Practice

Checklist, Australian Government Information

Management Office, From: www.finance.gov.au/

Ahuja, J. & Webster, J. (2001), ‘Perceived Disorientation:

An Examination of a New Measure to Assess Web

Design Effectiveness’, Interacting with Computers,

vol. 14, no. 1, pp.15-29.

Ayers, E. & Stasko, J. (1995), ‘Using Graphic History in

Browsing the WWW’, Proceedings of The Fourth

International WWW Conference, Boston, M.A.

Beard, D. & Walker, J. (1990), ‘Navigational Techniques

to Improve the Display of Large Two-Dimensional

Space’, Behaviour and Information Technology, vol.

9, no. 6, pp. 451-466.

Beier, B., & Vaughan, M. W. (2003), ‘The Bull's-Eye: A

Framework for Web Application User Interface

Design Guidelines’, Proceeding of the ACM SIGCHI

Conference on Human Factors in Computing

Systems: CHI'93, Fort Lauderdale, Florida.

Bieber, M., Vitali, F., Ashman, H., Balasubramanian, V.

& Oinas-Kukkonen, H. (1997), ‘Fourth Generation

Hypermedia: Some Missing Links for the World Wide

Web’, Int. J. Human-Computer Studies, vol. 47, no.1,

pp. 31-66.

Cockburn, A. & Jones, S. (1997), ‘Design Issues for

World Wide Web Navigation Visualisation Tools’,

Proceedings of RIAO'97: The Fifth Conference on

Computer-Assisted Research of Information,

Montreal, Canada.

Cockburn, A., Greenberg, S., Jones, S., McKenzie, B. &

Moyle, M. (2003), ‘Improving Web Page

Revisitation: Analysis, Design and Evaluation’,

Information Technology and Society, vol. 1, no. 3,

pp.159-183.

Conklin, J. (1987), ‘Hypertext: An Introduction and

Survey’, IEEE Computer, vol. 20, no. 9, pp.17-40.

Danielson, D. (2002), ‘Web Navigation and the

Behavioral Effects of Constantly Visible Site Maps’,

Interacting with Computers, vol. 14, no. 5, pp. 601-

618.

Davidson, R. (2003), Reading Topographic Maps, From:

http://www.map-reading.com/.

Hedberg, J. & Harper B. (1992), ‘Creating Interface

Metaphors for Interactive Multimedia’, Proceedings

of the International Interactive Multimedia

Symposium, Perth, W.A.

HHS (2006), Web Design &Usability Guidelines, U.S.

Department of Health and Human Services’, From:

http://www.usability.gov/guidelines.

Hoffman, M. (1996), Enabling Extremely Rapid

Navigation in Your Web Document, From:

http://www.nuceng.ca/teach/format/hoffman.pdf.

Hornbæk, K. & Frøkjær, E. (2001), ‘Reading of

Electronic Documents: The Usability of Linear,

Fisheye, and Overview+Detail Interfaces’,

Proceedings of the SIGCHI Conference on Human

Factors in Computing Systems: CHI '01, Seattle.

Jul, S. & Furnas, G. (1998), ‘Critical Zones in Desert

Fog: Aids to Multiscale Navigation’, Proceedings of

the 11th Annual ACM Symposium on User Interface

Software and Technology: UIST, San Francisco, CA.


8

Katz, M. & Byrne, M. (2003), ‘Effects of Scent and

Breadth on Use of Site-specific Search on E-

Commerce Websites’, Transactions on Computer-

Human Interaction, vol. 10, no. 3, pp.198-220.

Lynch, P. & Horton, S. (2009), Web Style Guide: Basic

Design Principles for Creating Websites, 3rd

Edition

Yale University Press, New Haven, CT. From

http://webstyleguide.com.

McDonald, S. & Stevenson, R. (1997), ‘The Effects of a

Spatial and a Conceptual Map On Navigation and

Learning in Hypertext’, Proceedings of the World

Conference on Educational Multimedia and

Hypermedia, Charlottesville, VA, AACE.

Mukherjea, S. & Foley, J. (1995), ‘Visualising the World-

Wide Web with the Navigational View Builder’,

Proceedings of the Third International World-Wide

Web Conference, WWW'95, Darmstadt, Germany.

Nielsen, J. (2001), Search: Visible and Simple, Alertbox,

From: http://www.useit.com/alertbox/20010513.html,

May 13, 2001.

Nielsen, J. (2000), Designing Web Usability: The

Practice of Simplicity, New Riders Pub., Indianapolis.

Pirolli, P., Card, S. & Van Der Wege, M. (2001), ‘Visual

Information Foraging in a Focus + Context

Visualisation’, Proceedings of the ACM SIGCHI

Conference on Human Factors in Computing:

CHI’01, Seattle, WA, pp. 506-513.

Preece, J., Rodgers, Y., Sharp, H., Benyon, D., Holland,

S. & Carey, T. (1994), Human Computer Interaction,

Addison Wesley.

Rosenfeld, L. & Morville, P. (1998), Information

Architecture for the World Wide Web, O’Reilly, CA.

Russell, M. (2002), ‘Fortune 500 Revisited: Current

Trends in Sitemap Design’, Usability News, vol. 4, no.

2.

Shneiderman, B. (1997), ‘Designing Information

Abundant Websites: Issues and Recommendations’,

International Journal of Human-Computer Studies,

vol. 47, no. 1, pp. 5-29.

Sifer, M. & Liechti, O. (1999), ‘Zooming in One

Dimension Can be Better Than Two: An Interface for

Placing Search Results in Context with a Restricted

Sitemap’, Proceedings of the IEEE Symposium on

Visual Languages: VL'99, Tokyo, Japan, pp. 72-79.

UsabilityNet, (2006), Design Guidelines for the Web,

UsabilityNet, From: http://www.usabilitynet.org/

W3C (2011), About The World Wide Web, Available at:

http://www.w3.org/WWW/

Yip, A. (2004), ‘The Effect of Different Types of Site

Maps on User's Performance in an Information-

Searching Task’, Proceedings of the 13th

International World Wide Web Conference:

WWW2004, New York, NY, pp. 368-369.

Xu, G., Cockburn, A. & McKenzie, B. (2001), ‘Lost on

the Web: An Introduction to Web Navigation

Research’, The 4th New Zealand Computer Science

Research Students Conference: Christchurch, NZ.


9


10

Leveraging Human Movement in the Ultimate Display Rohan McAdam

Centre for Research in Complex Systems Charles Sturt University

Bathurst, NSW [email protected]

Keith Nesbitt School of Design, Communication and IT,

University of Newcastle Callaghan 2300, NSW

[email protected]

Abstract Human movement is a “natural skill” employed to solve difficult problems in dynamics concerning the manipulation of a complex biomechanical system, the body, in an ever-changing environment. Continuous Interactive Simulation (CIS) is a technique that attempts to use this human capacity to solve problems in movement dynamics to solve problems concerning arbitrary dynamical systems. In this paper we test a simple CIS environment that allows a user to interact with an arbitrary dynamical system through continuous movement actions. Using this environment we construct an abstract representation of the well-known pole-cart, or inverted pendulum system. Next we undertake a usability trial and observe the way users explore key features of the system’s dynamics. All users are able to discover the stable equilibria and the majority of users also discover the unstable equilibria of the system. The results confirm that even simple movement-based interfaces can be effective in engaging the human sensory-motor system in the exploration of nontrivial dynamical systems.. Keywords: Movement, Human Computation, Natural User Interfaces, Dynamical systems

1 Introduction “We live in a physical world whose properties we

have come to know well through long familiarity. We sense an involvement with this physical world which gives us the ability to predict its properties well. For example, we can predict where objects will fall, how well known shapes look from other angles, and how much force is required to push objects against friction. We lack corresponding familiarity with the forces on charged particles, forces in non-uniform fields, the effects of nonprojective geometric transformations, and high-inertia, low friction motion. A display connected to a digital computer gives us a chance to gain familiarity with concepts not realizable in the physical world. It is a

Copyright © 2012, Australian Computer Society, Inc. This paper appeared at the 13th Australasian User Interface Conference (AUIC 2012), Melbourne, Australia, January-February 2012. Conferences in Research and Practice in Information Technology (CRPIT), Vol. 126. H. Shen and R. Smith, Eds. Reproduction for academic, not-for-profit purposes permitted provided this text is included.

looking glass into a mathematical wonderland.” (Sutherland 1965).

These words were originally written by Ivan

Sutherland during the 1960s to describe the “ultimate display”, a vision of Virtual Reality which is still yet to be fully realised. Aligned with Sutherland’s vision we have been using Virtual Environments to try and leverage the human skills related to movement for the particular purpose of solving more abstract problems in mathematics.

In previous work we have described this approach as “Continuous Interactive Simulation” (CIS) since it is based on continuous feedback loops between the user and a simulation of a dynamical system (see Figure 1) (McAdam 2010, McAdam and Nesbitt 2011). These loops are typical of the sensory-motor loops associated with human movement. While human movement is naturally used to solve complex problems in movement dynamics, we try and leverage our natural ability to learn new movement skills in such a way that a user can explore, understand and control arbitrary systems characterised by non-linear dynamics.

Figure 1. An example of an environment used for

Continuous Interactive Simulation

The key contribution of this paper is a usability study

into the effectiveness of a simple CIS environment in engaging a user in sensory-motor exploration of a non-


11

linear dynamical system. The system chosen for this study is the well-known cart-pole system. A more general form of this system is familiar to anyone who has balanced a pole on the palm of their hand. This system was chosen because it is involves non-trivial dynamics, but is also within the capabilities of most people to control (Foo et al. 2000). As a result, failure of users to effectively engage with the system will likely be due to the way it is presented in the CIS environment rather than the dynamics of the system itself. It should be noted that the CIS environment we are using is designed to allow sensory-motor engagement with arbitrary dynamical systems that may have no physical basis whatsoever. As a result, it uses an abstract representation of system dynamics that robs the pole-cart system of its natural affordances.

In our CIS environment the pole-cart system was represented in a multi-sensory virtual environment where the 3D phase space of the pole-cart problem was mapped to a 3D visual coordinate system. The position of a ball was used to represent the system’s current state. Stereoscopic display and 3D sound effects were used to enhance the user’s spatial cues for the location of the ball in the phase space. The user can manipulate the system state by adjusting the single control parameter by continuously moving a haptic pen constrained to a single dimension.

We carried out a usability trial where 10 users were observed as they spent 2 hours exploring the dynamics of the system. Users were asked to think aloud during the trial and were also interviewed at the end. We report on the users’ experiences and exploration strategies when they first interact with the system and how these strategies change over time. All users are able to uncover significant features in the system, namely the set of stable equilibrium points. The majority of users also discover the unstable equilibrium points that are characteristic of this system. Two of the users develop significant skill in manipulating the system during the time frame of the trial. Although a number of further studies are required, the outcomes reported here confirm that movement-based interfaces can indeed be leveraged for the exploration of non-linear dynamics.

2 Human Movement We all continually reach, grasp, gesture, talk, and walk. From time to time we run, jump, swim, sing, dance, and play musical instruments. We write, type, point and click. We use tools, drive vehicles and control machines. We move. Moving proficiently requires that complex hierarchies of movements be mastered and integrated into sequences that achieve specific goals. A movement with a specified purpose or goal is called a skill (Magill 2007).

The ability to adapt movement to suit the conditions has been referred to as “dexterity” and defined as “finding a motor solution for any situation in any condition” (Bernstein 1996). This adaptability of movement has also been described as “physical intelligence” – the “capacity to use your whole body or parts of your body to solve a problem...” (Gardner 1993). In the face of constantly changing environmental conditions and changing goals it is this problem solving capacity that allows us to reliably perform a skill.

In more technical terms, successful performance of a movement skill requires interactive control of the dynamical system consisting of the human biomechanical system and the environment with which it interacts (Neilson and Neilson 2005). This control is performed by the human sensory-motor loop in which the central nervous system receives incoming sensory information from the sense organs and produces motor commands that cause muscles to contract.

For example, shooting a basket in basketball involves motion of the body, the arms and hands in particular, with the hands imparting a force on the ball such that it achieves a trajectory that passes through the hoop. Doing so requires that the dynamics of the physical situation be taken into account by the mechanisms underlying movement. Things to be considered include the dynamics of the human biomechanical system itself, the interaction between the hand and the ball, the ball’s trajectory through the air toward the hoop, and the dynamics of the ball’s collision with the backboard. Achieving such a feat requires the solution of difficult problems such as prediction, optimisation and control in the face of delayed and incomplete sensory information, time varying nonlinear input-output relationships, and constant disturbance (Wolpert et al. 2001).

A key feature of human movement is the ability to learn new skills. In effect, each new physical situation in which movement occurs represents a new dynamical system for which these problems need to be solved. The mechanisms for solving these problems are provided by complex structures in the central nervous system such as the cerebellum, basal ganglia and motor cortex. The process of learning a new skill involves adapting these mechanisms to a new dynamic situation such that appropriate motor commands are generated in order to produce the desired movement outcome.

Learning new movement skills is a complex process. During the usability trial we wanted to observe if and how users develop skills for controlling the abstract representation of the pole-cart system. For normal motor skills it is known that learning progresses from an initial trial and error exploration and then becomes more purposeful, consistent, stable, permanent, and adaptable over time (Magill 2007). We hoped to observe a similar pattern of skill acquisition as users learned to move within our abstract simulation.

3 Understanding Dynamical Systems A dynamical system is a system whose behaviour can be described in terms of rules that define how the state of system changes over time. There are numerous forms of dynamical system, such as continuous, discrete, stochastic, and so on. We are interested in continuous dynamical systems in which the rules take the form of differential equations. These systems can be used to represent a broad range of behaviours from fields as diverse as physics, engineering, biology, economics, and sociology.

Because the rules for a dynamical system typically involve nonlinear relationships they can be difficult to understand and manipulate. There have been many tools developed to help solve problems concerning the control of dynamical systems. One way in which these tools vary


12

is in the particular user expertise they engage in the problem solving process. Some tools require considerable mathematical expertise. Examples include mathematical software such as Matlab and Mathematica for analysing and solving the equations defining the behaviour of a system (MathWorks 2010, Wolfram Research 2010). Other tools take a particular problem solving technique and present it in a user friendly way. For example, Simulink and Vensim provide a building block approach for constructing simulations of dynamical systems (MathWorks 2010, Ventana 2010). While these tools are still essentially mathematical in nature, much of the mathematical complexity of constructing a simulation is hidden from the user. Even with tools such as these the user may still need considerable mathematical sophistication to ensure that results are valid.

Some tools aim at leveraging domain expertise by further hiding mathematical complexity and allowing a user to formulate problems using the concepts and language of the problem domain (Houstis and Rice 2002). One such tool is RAMSES, which is designed for “non-computer scientists” studying environmental systems (ETH 2010).

Closely related to the simulation of dynamical systems is a means of visualizing the behaviour of a system. This is often used as a means of illustrating a result obtained analytically. Visualisation can also be used to help identify features, such as equilbria or other patterns of behaviour that might not be found using analytical techniques (Groller et al. 1996). Visualization techniques have also been extended to include other sensory modalities such as hearing and touch to enhance the presentation of the system’s behaviour (Wegenkittl et al. 1997).

Interactive visualization tools enable users to more rapidly perform simulations, review the results, modify the system and re-run the simulation (Zudilova-Seinstra et al. 2009). Interactive workflows support the process of exploring the behaviour of a system. However, this interaction is usually discrete in nature and directed at presentation factors such at changing rendering techniques or the users point of view. By contrast, computational steering (Mulder et al. 1999, Kalawsky 2009, Tennenhouse 2000) allows the user to modify the parameters of a simulation in order to explore the behaviour of a system under different initial conditions or by some form of intervention during the simulation.

In all of these examples, the human expertise being utilised is of a high-level and cognitive in nature. A very different, low level form of human expertise has also used to solve problems concerning the manipulation of constrained physical systems (Brooks et al. 1990, Witkin et al. 1990) and unstable, rigid body systems (Laszlo et al. 2000). In these cases a user’s intuitive motor learning and motion planning skills are used to manipulate a real-time simulation of a system. Our own work builds on these particular ideas.

4 Exploring Dynamical Systems One source of problems regarding dynamical systems

that can be difficult to solve concerns the manipulation of a system. Many dynamical systems provide opportunities for intervention that can alter the future course of the

system. For example, a dynamical system model of an economy interacting with an environment can be manipulated by varying parameters such as tax rates, controls on emissions from industry, etc (e.g., Kohring 2006). One question that can be asked in such a case is how to manipulate the system to achieve a specific outcome, such as maximising production without causing environmental collapse.

Techniques such as optimal control (Kirk 2004) exist to solve this sort of manipulation problem, but again, these techniques rely on certain assumptions that do not apply to all problems of this sort and where they do apply they require a degree of mathematical sophistication that may be beyond a non-specialist in optimal control theory.

A more general question concerning the manipulation of a dynamical system is – given a dynamical system and opportunities for intervening in that system, what are various ways in which it might be manipulated? Or, more simply, what can be done with the system? For example, what is the effect of various tax and emission control policies on a combined economic/environmental system? In contrast to the problem of manipulating a system to achieve a particular outcome this is a more open-ended question inviting a more exploratory approach. Successful exploration might result in a repertoire of manipulations that illustrate the dynamic possibilities of the system given the available controls.

In our usability study we want to investigate the effectiveness of Continuous Interactive Simulation (CIS) in allowing users to both explore and identify various features of a dynamical system. Our approach makes use of the natural human ability to understand and manipulate the complex physical dynamical systems encountered in human movement. To achieve this we create an environment in which the dynamical system is presented as a “physical” object with which users can interact in purely sensory-motor terms. The behaviour of such an object will initially be unfamiliar to users, but it is intended that through a process of sensory-motor exploration users will be able to learn how the system behaves and how it can be controlled. This process is, of course, familiar to anyone who has attempted to learn a new physical activity. Such an approach requires no domain-specific or mathematical expertise, only the natural expertise we all have in exploring and mastering dynamics of the physical world in which we live.

5 A Continuous Interactive Simulation The CIS environment used in the usability study is capable of representing any continuous dynamical system consisting of up to three state variables and three control variables. This simple environment uses a desktop virtual environment to represent the state of a system using an animated phase space in which the location of a ball in 3D space represents the current state of the system. As the state of the system evolves the ball moves through space. The control variables of the system are manipulated by the user using a continuous input device. The system is simulated in real time so that ball moves according to the dynamics of the system under the influence of the user’s movements of the pen.

The desktop virtual environment is shown in Figure 1. It consists of a 3.00GHz dual core Dell T3400 computer,


13

a 120Hz 22-inch monitor, and a Phantom Omni 6 degree-of-freedom haptic pen for input. The 3D visualization was implemented using Microsoft DirectX on Windows 7 with stereoscopic rendering and 3D sound effects to reinforce the ball’s position and motion in space. Stereoscopic rendering was provided by an nVidia GeForce GTX 275 video card with nVidia active shutter glasses. The general arrangement is shown in Figure 1. Sound was provided by a Logitech G51 5.1 surround sound speaker system. The dynamical system is simulated using a 4th order Runge-Kutta solver. The simulation and the virtual environment were updated at a rate of 60Hz.

The dynamical system simulated in this study was the well-known cart-pole system. This system is physically characterized by a cart that moves only in the horizontal direction. Attached to the cart by a pivot is a pole that is free to rotate (See Figure 2). There is friction in the pivot and in the wheels of the cart. The dynamics of this system can be expressed in terms of three state variables, i.e., the angular displacement of the pole, the angular velocity of the pole, and the linear velocity of the cart. There is one control variable, the force applied to the cart to move it either left or right. The system has both stable (pole hanging down) and unstable (pole balanced upright) equilibria. The full details of the equations of motion for this system are described elsewhere (Florian 2007).

Figure 2. The physical arrangement of the cart-pole

system.

In our abstract representation of the cart-pole system the 3 state variables, angular displacement of the pole, angular velocity of the pole and the cart velocity are mapped onto the x, y, and z axes of the 3D virtual

environment respectively (see Figure 3). This mapping was essentially arbitrary, based simply on the order in which the equations of motion are usually written. The position of the haptic pen (constrained to move only in the +/- x direction) was mapped to the control variable representing the force applied to the cart. A virtual spring returned the pen to the zero position if the user exerted no force on the pen. The user’s field of view included six evenly spaced stable equilibria at (±π, 0, 0); (±3π, 0, 0); and (±5π, 0, 0); and five equally spaced unstable equilibria at (0, 0, 0); (±2π, 0, 0); and (±4π, 0, 0). These equilibria appear to the user as locations in space where the ball can be brought to rest. The equilibria associated with zero velocity of the cart also extend in the ±z direction, applying for other constant velocities.

Figure 3. The abstract representation of the 3D phase space with a ball used to mark the current state. Note that the applied force is constrained by the haptic pen

to a single dimension.

Figure 4. The user’s view of the system’s response

after a rapid movement of the pen. The trajectory of the ball has been reconstructed for illustration.

The effect of this abstract representation of the system is to rob it of its physical arrangement from which its behaviour can readily be deduced. Instead, users are confronted with a ball that gives no clues as to how it might behave. Users can only begin to understand how


14

the system behaves through sensory-motor interaction. The user’s view of the behaviour of the system in response to a large movement of the pen is shown in Figure 4. None of the users in the usability study recognized the system as having its basis in the dynamics of a physical pendulum. This obfuscation of the physical character of the system served to both prevent users from guessing the behaviour of the system and to illustrate the representation of systems that have no physical basis whatsoever.

We also note that this system was chosen because it exhibits non-trivial, non-linear dynamics and yet is within human sensory-motor capabilities. A more general form of these dynamics are familiar to anyone who has tried to balance a pole on the palm of their hand. If users are not able to deal with this system in our simple CIS environment, then this is likely due to limitations in the way in which the system is presented in the CIS environment rather than the dynamic complexities of the system.

6 Usability Study The study had three aims. The first was to answer the basic feasibility question of whether a simple CIS environment can provide sufficient sensory-motor engagement to allow users to discover important features of a nonlinear dynamical system (stable and unstable equilibria). The second aim was exploratory in nature. We wanted to observe the way users approached their investigation task. Given the unfamiliar non-linear behaviour of the system, what strategies do users take in learning to manipulate the system? Finally we wanted to try and identify usability issues with the interface itself and highlight key areas that would focus further development of our general approach.

A total of 10 adult users, 7 male and 3 female, were recruited from staff and students at a university. All users had normal stereoscopic vision. All were right handed. None of the users had any experience with the analysis of dynamical systems. Each user spent two one-hour sessions exploring the behaviour of the system. At the beginning of the first session users were familiarised with the operation of the virtual environment and given the task of exploring the behaviour of the system looking for stable and unstable equilibria. This was explained to the users as:

• Try and work out how the ball moves and to what extent you can control it with the pen

• Try and find places where the ball comes to rest either of its own accord (stable equilibria) or with you holding it in place (unstable equilibria).

• Mark these places by clicking the button on the pen.

In addition, users were given the following suggestions on what they might do to help them get started:

• Do nothing • Try small and large movements of the pen • Try slow and fast movements of the pen • Try these things at different points in the path of

the ball Users were asked to record the location of equilibria

by pressing a button on the haptic pen that left a marker at

the current location of ball (see Figure 5). Users were encouraged to “think out loud” as they explored the behaviour of the system. All user sessions were video recorded.

At the conclusion of the final session, users were asked the following general questions:

1. What was your overall impression of the experience?

2. What particular difficulties, if any, did you have in performing the task given to you?

3. The motion of the ball represents the behaviour of a real world physical object. Do you have any idea what that object might be?

7 Results All users were able to discover the equally spaced stable equilibria at (±π, 0, 0); (±3π, 0, 0); and (±5π, 0, 0). Once reached these locations are characterized as places in the phase space were the ball stays at rest. Furthermore all users discovered the set of stable equilibria related to constant cart velocities at (±π, 0, ±z); (±3π, 0, ±z); and (±5π, 0, ±z). These equilibria extended along a line in the ±z direction from each zero velocity stable equilibrium. Figure 5 illustrates one users progress toward identifying the stable equilibria.

Eight of the ten users were also able to identify the existence of equally spaced, unstable equilibria that lie between the stable equilibria at (0; 0; 0); (±2π, 0, 0); and (±4π, 0, 0). Two users also correctly identified that constant cart velocity unstable equilibria also extended from these zero cart velocity equilibria in the ±z direction.

These results show that the abstract representation of the cart-pole system provided by the simple CIS environment described is sufficient to allow the equilibria of this system to be identified by users with no knowledge of the analysis of dynamical systems.

Figure 5. A user’s progress toward identify stable

equilibria. The user has marked six equally spaced zero cart velocity stable equilibria in the x-direction.

The user has also marked constant cart velocity equilibria extending from the two left most zero

velocity equilibria in the ±z-direction. The trajectory of the ball has been partially reconstructed for

illustration.


15

Figure 6 Plots of angular displacement versus angular

velocity illustrating a user’s increasing familiarity with the cart-pole system. Crosses indicated stable equilibria. Circles indicate unstable equilibria. (a)

Intrinsic dynamics of the system with no user input (b) Initial interaction (c) Deliberately perturbing the

system toward another stable equilibrium (d) Orbiting two adjacent stable equilibra in order to “probe” the

intervening unstable equilibrium.

We also examined the recorded trials to try and ascertain how users learned to control the system. Individual experiences and the users level of achievement varied considerably. However, the users’ pattern of

engagement with the system was consistent. All users appeared to progress through similar phases of discovery corresponding to initial interaction with the system, discovery of stable equilibria, and discovery of unstable equilibria. The following sections summarize the key observations made during each of these phases. To help illustrate these learning phases the response of the system to user input for one user is shown in Figure 6.

7.1 Initial interaction A user’s initial attempt at interacting with the system

was universally met with surprise. All users commented on two striking features of the system’s behaviour. Firstly, the ball moved in three dimensions while the pen only moved in one dimension. Secondly, the ball behaved very erratically in response to movement of the pen, often bouncing rapidly out of view if the user made a rapid movement of the pen.

After this initial surprise, users set about making exploratory movements of the pen to try and work out the relationship between movement of the pen and movement of the ball. In most cases users made continuous and often large movements of the pen rather than letting the intrinsic dynamics of the system play out without their intervention. As a result early interaction was characterised by large erratic excursions of the ball through space, as illustrated in Figure 6b. This resulted in some users expressing a degree of frustration with the difficulty of the task. If this occurred, they were reminded of the initial suggestions they had been given, which included doing nothing.

As it became apparent to users that the relationship between their movement actions and the response of the system was not at all straightforward, users tended to adopt a somewhat more systematic approach. They would begin by allowing the ball to settle into a stable equilibrium after which they would make a short movement of the pen and then observe the subsequent behaviour of the ball. This allowed users to perturb the system and then observe the intrinsic dynamics of the system, which would bring the ball to rest at one or other of the stable equilibria. This approach allowed users to discover a number of the stable equilibria, although the particular equilibria discovered was largely a matter of chance.

7.2 Discovering stable equilibria The essentially random discovery of stable equilibria characteristic of a user’s initial interaction with a system was enough to alert users to the overall topological structure of the system’s equilibria. With the knowledge that multiple stable equilibria existed several users hypothesised the existence of additional equilibria that they had not yet located realising that the stable equilibria were probably equally spaced.

Verifying that an equilibrium existed at a particular position required that the user manoeuvre the ball into that position. This required users to try and learn how to control the ball in order to put it where they wanted. Without too much trouble users were able to work out the movements required to the move the ball either to the left or the right, from one stable equilibrium to another.


16

However, when moving between equilibria, users initially had difficulty in controlling whether it was the equilibrium immediately adjacent to the starting equilibrium or one further away. Reliably manoeuvring the ball into an adjacent equilibrium required more precise control over the magnitude and timing of the pen’s movement. Eventually all users were able to do this semi-reliably (i.e, they could move the ball into an adjacent equilibrium, but it may take more than one attempt). Three users were able to reliably move the ball from one stable equilibrium to another stable equilibrium of their choosing, as illustrated in Figure 6c.

A further feature of the system discovered by all users was the existence of additional stable equilibria for constant velocities of the cart. Users discovered these by making small slow movements of the pen starting with the ball at a stable equilibrium. The ball would move forward and backward and could be brought to rest at any point on a line extending in the ±z direction from a stable equilibrium with a fixed displacement of the pen (constant force on the cart). These constant velocity equilibria were typically discovered after the equally spaced zero velocity equilibria, although two users discovered them first.

All but one user managed to discover all of the stable equilibria in the first one-hour session. The remaining user completed their discovery of all of the stable equilibria early in the second session.

Another observation made during this phase of discovery was the way in which users described the behaviour of the system. Users had been asked to “think out loud” and so were forced to try and put the behaviour of the system into words. While users appeared to be trying to describe similar structures and behaviours they used very different language to do so. A region of erratic behaviour was described as a “vortex”, a “ladder” and even “the zig-zaggy place”. The attractive regions surrounding the lines of constant velocity equilibria were variously described as “cylinders”, “channels”, “lanes”, “tunnels”, “magnets”, and “quantum wells”. The motion of the ball was variously describes as “falling”, “flying”, “bouncing”, “skipping”, “floating”, and “gravitating”. Movement of the pen was described as “pushing”, “pulling”, or “flicking”.

The use of language such this seemed to be most prominent in the early stages of exploring the system. As users became more adept at controlling the ball they seemed to spend less time talking about what they saw the system doing and more time just interacting with the system, perhaps stopping occasionally to comment on something new they wanted to demonstrate.

7.3 Discovering unstable equilibria Eight of the ten users were able to correctly identify

that an unstable equilibrium existed between each pair of stable equilibria. Discovery of these appeared to be more difficult for users and only occurred in the second one-hour session and after users had discovered the stable equilibria.

Identifying an unstable equilibrium seemed to occur when the ball came within sufficiently close proximity for the ball to slow to almost a complete stop before accelerating away again. Typically this would have to

happen on a number of occasions before a user actually noticed the ball slowing to a near stop and deduce that there might be another equilibrium present.

In order to more precisely locate the position of an unstable equilibrium, several users adopted a “probing” strategy in which they would repeatedly launch the ball from a stable equilibrium toward the region containing the unstable equilibrium. One user in particular seemed to become quite skilled at probing the region between two stable equilibria by circulating the ball in an “orbit” around both stable equilibria approaching the unstable equilibria during the ball’s passage between the stable equilibria, as illustrated in Figure 6d. This allowed the user to deduce that the location of the unstable equilibrium was at the point directly between the stable equilibria.

The two users who were not able to identify the existence of any unstable equilibria had both been only partially successful at deliberately moving the ball between adjacent stable equilibria. Both users expressed some frustration during the second one-hour session when it became clear that they were not making any new discoveries or becoming any more adept at controlling the ball.

Of the eight who did identify the zero cart velocity unstable equilibria, only two users went on to correctly identify the constant cart velocity equilibria extending in the in the ±z direction from each of these. These users were two of the three users who were able to reliably pass the ball between adjacent stable equilibria, suggesting that this skill may have been a prerequisite for discovering more subtle aspects of the system’s behaviour.

An important observation during this phase of discovery was that while eight out of ten users correctly identified the existence of the unstable equilibria, no user developed sufficient skill to be able to maintain the ball at an unstable equilibrium for any length of time.

7.4 User impressions When asked for their overall impression of their experience with the system at the end of the study user feedback varied considerably. Perhaps unsurprisingly, the users who provided the most positive feedback were those who met with most success in learning how the system behaved and how to control it. These users described the experience as “challenging”, “fun”, and “absorbing”. Users who had more difficulty typically had a less sanguine view of the experience describing it as “frustrating”, “difficult”, “tedious”, and “really, really hard”.

A sense of progress and achievement seemed to be important to a user’s view of the experience with one user saying that it was “quite tedious” until they discovered a new aspect of system’s behaviour that made them want to know “what more can I do with this?”. This same user described the experience as becoming more fun as they got better to the point where they felt that they “owned the ball” suggesting, in their own mind at least, that they had in some way mastered the problem given to them.


17

All users cited the unexpected behaviour of the ball in response to movements of the pen as the chief difficulty that they had in exploring the behaviour of the system.

The users who did manage to identify unstable equilibria noted the difficulty in precisely locating the position of an equilibrium and then maintaining the equilibrium. Reasons given for this difficulty included that they had had insufficient practice, that the region of space into which they had to maneuver the ball was too small, and that the ball was moving too fast. Interestingly, none of the users jumped to the conclusion that it might not be possible to maintain the ball in unstable equilibrium (indeed it is not impossible).

At the end of the study all users were asked to speculate on the nature of the system represented by the behaviour of the ball. This proved to be very difficult for users and in fact none of the users were able to hazard any guess as to what the system might have been.

8 Discussion The cart-pole system used in this study is a system possessing non-trivial, non-linear dynamics. When presented in abstract form in a simple CIS environment the system is stripped of any clues as to its behaviour provided by the physical arrangement of its parts. Nonetheless, users were still able to explore the behaviour of the system and correctly identify the existence of stable and unstable equilibria. While the abstract representation made the task of discovering the behaviour of the system much more difficult than it would have been had the users been presented with the system in its literal physical form, it has the distinct advantage of being able to represent any continuous dynamical system with up to three state and control variables.

The progression of users through the phases of discovery described in the previous section suggests that their learning experience was consistent with well-known models of sensory-motor learning (Magill 2007). Initial interaction with the system was characterised by deliberate experimentation with the effect of movements of the pen on the motion of the ball characteristic of a “conscious” stage of learning. Users would make mistakes and not know how to correct them. As the study progressed users talked less about what their hand was doing and more about what the ball was doing. This is characteristic of an “associative” stage of sensory-motor learning (Magill, 2007).

Two users achieved a level of skill that might even be considered “automatic” in the time available. This observation suggests that users were indeed engaging with the system in sensory-motor terms. It also suggests that literature on sensory-motor learning might be important in creating user experiences that facilitate the exploration and mastery of novel dynamics presented in this way. For example, what sort of feedback should be provided to users to help them understand and improve their current level of skill in manipulating a system (e.g., Salmoni et al., 1984).

The abstract representation of the system rendered the system unrecognizable to users. It also had a significant effect on the effective dynamics encountered by the users’ sensory-motor system. The mapping of the system

into the virtual environment meant that displacements of the ball the y and z directions represented velocities of parts of the system rather than displacements. This effectively changed the order of the control relationship between the user and the system when compared with the physical incarnation of the system. For this reason, it is not at all clear whether skills obtained with the abstract form of the system would transfer to the physical system.

An important point to note is that in the process of exploring the system in order to find equilibria, users gain some facility in manipulating the system in order to move it between various equilibrium states. Indeed, this is unavoidable since the only means the users have to explore the behaviour of the system is by manipulating it. An important consequence of this is that users discovered not only the existence of equilibria, but also control strategies for achieving those equilibria.

Of course, all of the manoeuvres performed by users with the ball have an equivalent in the physical realisation of the system. For example, the “orbiting” behaviour shown in Figure 6d corresponds to the pole being swung upright so that it passes through the vertical position and falls over. The pole is then swung up again in the reverse direction back through the vertical position, and so on. As the user gets the ball closer to the unstable equilibrium position, the pole slows more at the vertical position. If the ball comes to rest at the equilibrium position then the pole has also stopped at the vertical position.

In terms of developing Continuous Interactive Simulation as a more general approach for studying arbitrary dynamical systems, this study raises a number of usability issues. Chief amongst these are the unexpected response of the system to user input and the presentation of the system in terms of scaling in space and time.

All users cited the entirely unexpected behaviour of the system as the main difficulty they had in exploring the system. This can be characterised as a lack of compatibility between the response of this system and the response of other systems that users are familiar with (Wickens and Hollands 2000). Users may have expected the pen and ball to behave as a pointing device like a mouse with a straightforward relationship between pen and ball behaviour. Whatever preconceptions they may have had, they were dashed the first time they moved the pen.

This lack of preparedness for the behaviour of the system might also be characterised as a “gulf of execution” in which the system does not provide users with the ready means to achieve what they want, i.e. steer the ball (Norman 1988). In most situations user interface design aims to minimise this gap to make an interface straightforward and obvious to use. By contrast we intend to utilise human sensory-motor learning as a means of discovering the relationship between action and system response. In effect, these types of systems require an essential gulf of execution that must be bridged by the human’s ability to acquire novel skills.

Despite this, the task of grappling with unfamiliar dynamics should not be made more difficult than necessary. The order in which state and control variables are mapped onto the axes of the 3D visualisation and the haptic pen may be a factor in making the system response more predictable. In the present study the order in which


18

variables were mapped was essentially arbitrary. Had the linear velocity of the cart been mapped onto the x-axis of the visualisation there would have been a strong correlation between left-right right movement of the pen and left-right movement of the ball. This may have reduced the number of mysteries presented to the user by the system.

In addition to the order in which variables are mapped into the virtual environment there is the key issue of how to scale the presentation of the system both in space and time. Indeed scaling in both space and time was implicated by users as a reason for the difficulty they had in maintaining an unstable equilibrium.

The somewhat arbitrary scaling chosen was that the users view encompassed a region of state variables extending from approximately (-20, -20, -10) to (20, 20, 10). This region was chosen so that it included a number of both stable and unstable equilibria and so that the user could explore the global behaviour of the system.

Because the behaviour of the system is simulated, it is possible to simulate the system at different rates. In this study the system was simulated in real time, that is, one second of real time corresponded to one second of simulated time. It is a straightforward matter to simulate the system at a rate faster or slower than this. This detail is of critical importance when dealing with arbitrary dynamical systems in which “real-time” might be measured in micro-seconds or centuries. In such cases scaling of time will be needed so that the behaviour of the system plays out at a rate suitable for sensory-motor interaction.

9 Conclusion In this paper we set out to test the effectiveness of a simple CIS environment in engaging a user’s sensory-motor capabilities in exploring a non-linear dynamical system. To answer this we developed a simple abstract representation of the cart-pole problem in a virtual environment. The interface was designed to leverage human movement for continuous interaction with an abstract representation of a simulated pole-cart system.

Ten users completed a 2 hour usability trial where they were required to explore and identify key features of the system dynamics. All users were able to discover the stable equilibria and the majority of users were also able to discover the unstable equilibria. All users were observed to follow consistent patterns of exploration typical of sensory-motor learning. Three users developed significant expertise in manipulating the system. The results confirm that even simple movement-based interfaces can be effective in engaging the human sensory-motor system in the exploration of non-linear dynamical systems.

Future work in developing CIS environments need to address the key issue of how best to design them. Essential to their purpose is providing users with a gulf of execution. However, this needs to be done in such a way that the problem of deciphering the relationship between their movements and the system’s response is no more difficult than it need be. One key to such design is the correct use of spatial and temporal scaling to present the system to the users.

10 References Bernstein, N.A. (1996): On dexterity and its

development. In Dexterity and Its Development. Latash, M.L. (ed), Lawrence Erlbaum Associates.

Brooks Jr, F. P., Ouh-Young, M., Batter, J.J., and Kilpatrick. P.J. (1990): Project GROPE Haptic displays for scientific visualization. Proceedings of the 17th annual conference on Computer graphics and interactive techniques, 177–185. ACM Press.

ETH (2010): Ramses: http://www.sysecol.ethz.ch/simsoftware/ramses/. Accessed 10 June, 2010.

Florian. R.V. (2007): Correct equations for the dynamics of the cartpole system. Technical report, Center for Cognitive and Neural Studies (Coneural), Romania.

Foo, P., Kelso, J. A. and De Guzman, G. C. (2000) Functional stabilization of unstable fixed points: Human pole balancing using time-to-balance information. Journal of Experimental Psychology: Human Perception and Performance 26(4) 1281-1297. American Psychological Association.

Gardner, H. (1993): Multiple intelligences. Basic Books Basic Books, New York.

Groller, E. Wegenkittl, R., Milik, A., Prskawetz, A., Feichtinger, G., and Sanderson, W.C. (1996): The geometry of Wonderland. Chaos, Solutions & Fractals, 7(12): 1989–2006.

Houstis, E.N. and Rice, J.R. (2002): Future problem solving environments for computational science. 93–114. Purdue University Press, IN, USA.

Kalawsky, R.S. (2009): Gaining Greater Insight Through Interactive Visualization: A Human Factors Perspective. Trends in Interactive Visualization: State-of- the Art Survey, 119–154. Springer.

Kirk, D. E. (2004): Optimal Control Theory: An Introduction. Courier Dover Publications.

Kohring, G.A. (2006): Avoiding chaos in Wonderland. Physica A: Statistical Mechanics and its Applications, 368(1):214–224, 2006.

Laszlo, J., van de Panne, M. and Fiume, E. (2000): Interactive control for physically-based animation. Proceedings of the 27th annual conference on Computer graphics and interactive techniques, 201–208. ACM Press/Addison-Wesley. New York.

Magill, R. (2007): Motor learning and control: concepts and applications. McGraw Hill.

MathWorks. (2011): Matlab and simulink. http://www.mathworks.com. Accessed 18 June, 2011.

McAdam, R. J. (2010) Continuous Interactive Simulation: Engaging the human sensory-motor system in understanding dynamical systems. Procedia Computer Science, 1(1):1691-1698, Elsevier.

McAdam, R. J. and Nesbitt, K. V. (2011) Movement-based interfaces for problem solving in dynamics.


19

Proceedings of the 22nd Australasian Conference on Information Systems, December 2011, In Press.

Mulder, J.D., van Wijk, J.J. and van Liere. R. (1999): A Survey of Computational Steering Environments. Future Generation Computer Systems,15(2):119–129.

Neilson, P.D. and Neilson, M.D. (2004): A new view on visuomotor channels: The case of the disappearing dynamics. Human movement science, 23(3-4):257-283.

Norman, D.A. (1988): The Psychology of Everyday Things. Basic Books, New York.

Salmoni, A.W., Schmidt, R.A. and Walter, C.B. (1984): Knowledge of results and motor learning: A review and critical reappraisal. Psychological Bulletin, 95(3):355–386.

Sutherland, I. E. (1965): The Ultimate Display”. IFIP'65 International Federation for Information Processing. vol 2, 506-508.

Tennenhouse. D. (2000): Proactive computing. Communications of the ACM, 43(5): 43–50.

Ventana. (2010). Ventana Systems Inc. Vensim. http://www.vensim.com. Accessed 12 Dec, 2010.

Wegenkittl, R., Loffelmann, R.H., and Groller, E. (1997): Visualizing the behavior of higher dimensional dynamical systems. Proc. of IEEE Visualization ‘97, pp 119–125.

Wickens, C.D. and Hollands, J.G. (2000): Engineering psychology and human performance. Prentice-Hall. New Jersey.

Witkin, A., Gleicher, M., and Welch. W. (1990): Interactive dynamics”. Proceedings of the 1990 symposium on Interactive 3D graphics, 11–21.

Wolfram Research. 2011. Mathematica. http://www.wolfram.com. Accessed 14 July, 2011.

Wolpert, D.M., Ghahramani, Z., and Flanagan, J.R. Perspectives and problems in motor learning. Trends in Cognitive Sciences, 5(11) pp 487–94.

Zudilova-Seinstra, E., Adriaansen, T. van Liere, R., Zudilova-Seinstra, E., Adriaansen, T., and van Liere, R. (2009): Overview of Interactive Visualization. In Trends in Interactive Visualization: State-of-the art survey, pp 3–15. Springer.


20

Website accessibility: An Australian viewJonathon Grantham*, Elizabeth Grantham†, David Powers*

*School of Computer Science, Engineering and Mathematics,Flinders University of South Australia

†School of EducationFlinders University of South Australia

PO Box 2100, Adelaide 5001, South [email protected]

AbstractFor nearly 20 years Australian and international legal requirements have existed around the development of accessible websites. This paper briefly reviews the history of legislation against web disability discrimination, along with the current legal requirements for website development as indicated by current international accessibility specifications, and reports on a manual examination of the accessibility of 40 Australian private and governmental websites. Not one of the 20 largest Australian companies, nor the Australian 20 Federal Government portfolios, were found to have produced a legally accessible website as per Australian standards.Keywords: accessibility, disabilities, Disabilities Discrimination Act, web development.

1 Introduction"The power of the Web is in its universality. Access by

everyone regardless of disability is an essential aspect.”Tim Berners-Lee, director and founder of

World Wide Web Consortium (W3C), 2002

Website accessibility refers to the practice of making websites accessible to all users inclusive of race, nationality, religion and disability. Website accessibility includes, but is not limited to, the communication style of the text as well as the technical development of the website. Users have come to expect web accessibility, and Huang (2002) notes that, “Access to the Internet, to a large extent, decides whether or not one can fully participate in the increasingly turbulent and networked world.” Most governments have implemented laws and policies regarding their own websites, communication plans and technology mediums. The Australian Bureau of Statistics (2009) states that 18.5% of Australians have a disability. This figure does not include the significant percentage of Australians with temporary injury or disability, nor does it cover the aging population who, although without disability, can find themselves with similar accessibility difficulties.

However, of greater significance to the field of website and application design is the percentage of individuals (estimated at 10%) who have a disability that

affects their use of Information and Communication Technologies (ICT) (Royal National Institute of Blind People (RNIB), 2011). In addition, approximately 6.2 million Australians have poor literacy or numeracy skills, and of this figure, over a third (2.6 million) (ABS, 1996) have very poor literacy or numeracy skills. Low literacy and numeracy skills can significantly affect an individual’s access to and understanding of websites and can, in turn, limit ability to complete tasks such as forms and surveys online.

1.1 Why develop accessible websites?There are social, economic and legal arguments in favour of the development of accessible websites. Traditionally, corporate social responsibility has been based around environmental impact and anti-discrimination guidelines in the workplace (Australian Human Rights and Equal Opportunities Commission (AHREOC), 2010). Social responsibility towards web accessibility seems to have been largely left up to the individual person or organisation. In 2008, an Australian Senate motion emphasised the role of the Australian government and its responsibility to “foster a corporate culture respectful of human rights at home and abroad”. This motion encouraged all government portfolios to adhere to a common standard of website accessibility.

Huang (2002) notes the economic advantages to making a website accessible. Non-accessible websites run the risk of the potential alienation of between 10% (AHREOC, 2010) and 20% (Specific Learning Difficulties (SPELD), 2011) of the population. In the competitive corporate world, website accessibility can win or lose clientele and have significant impact on a company’s profits (Loiacono and McCoy, 2004). Limited access will encourage users with disabilities to find more accessible websites offering similar products or access more expensive channels such as call centres and walk-in branches.

Aside from the possible alienation of a significant percentage of potential clientele, the development of websites that comply with disability discrimination standards can potentially increase exposure and thus, increase the number of clientele both with and without disability.

1.2 Australian legal view“Accessible web pages promote equal access to information and opportunities”

-Spindler, 2002

Copyright © 2012, Australian Computer Society, Inc. This paper appeared at the 13th Australasian User Interface Conference (AUIC 2012), Melbourne, Australia, January-February 2012. Conferences in Research and Practice in Information Technology (CRPIT), Vol. 126. H. Shen and R. Smith, Eds. Reproduction for academic, not-for profit purposes permitted provided this text is included.


21

The Australian Human Rights Commission (AHREOC) is responsible for investigating discrimination on any grounds including race, colour, ethnic origin, sexual preference, gender, marital status, pregnancy and disability. The AHREOC (AHREOC) states that website owners are obliged to make websites accessible to everyone, without discrimination.

The Australian Human Rights Commission Disability Discrimination Act states;

“Provision of information and other material through the web is a service covered by the DDA. Equal access for people with a disability in this area is required by the DDA where it can reasonably be provided… This requirement applies to any individual or organisation developing a World Wide Web page in Australia, or placing or maintaining a web page on an Australian server… whether providing for payment or not.”

(AHREOC. Version 3.1 May 1999. World Wide Web Access: Disability Discrimination Act Advisory Notes)

Websites that do not conform to the DDA and accessibility guidelines run the risk of information provided within the website not being accessible to those who have a right to use it. Websites in which information is not accessible to all are in breach of the DDA, and therefore, the owners of the website can be prosecuted for discrimination. The most commonly referenced case of this nature is Maguire versus the Sydney Organising Committee for the Olympic Games (SOCOG). Maguire claimed that the SOCOG had created a website that was inaccessible for individuals with vision impairment. The website left individuals with vision impairment unable to access the ticketing information, event schedules or posted event results. The court ruled in favour of Maguire and under the DDA fined the SOCOG $20,000. The court case cost the SOCOG in excess of $500,000 (FCA, 2000).

On the 30th June 2010 the Minister for Finance and Deregulation, Lindsay Tanner, and Parliamentary Secretary for Disabilities, Bill Shorten, released the Website Accessibility National Transition Strategy and Implementation Plan for Australian Government agencies. The plan states that in a four year period ending in June 2014 all government department websites will meet the technical requirements of the Web Content Accessibility Guidelines 2.0 (WCAGv2, 2008). The WCAGv2 is a series of guidelines that ‘covers a range of recommendations for making web content more accessible’ (W3C, 2008). By meeting these guidelines organisations can create websites that offer accessibility for all.

South Australia and Victoria have the strictest guidelines of all Australian states in regards to disability discrimination legislation. South Australia commissioned websitecriteria (a private organisation focused around web accessibility) to write guidelines for website development, and later regulated that all South Australian Government websites must adhere to the guidelines stated by websitecriteria (2008) as well as the WCAGv2 (SAG, 2011). Websitecriteria is a detailed document that proposes guidelines for communication style and accessibility as opposed to just the technical syntactic requirement of a web language which is covered in the WCAGv2.

The Victorian Government took a similar approach, producing the “Victorian Government Accessibility Toolkit”, a recommendation for all Victorian Government websites. The “Victorian Government Accessibility Toolkit” is mostly derived from the WCAGv2 with a significant number of criteria existing in both specifications (VGAT, 2011). There is very little in the Toolkit referring to language communication styles.

1.3 International legal viewIn 1993 the United Nations released guidelines on the Equalisation of Opportunities of Persons with Disabilities. This document, although not strict law, outlines the need to meet a uniform standard in website development (ILI, 2011).

Most western countries have laws against the discrimination of people with disabilities. The United Kingdom has the Disability Discrimination Act of 1995 which was later extended with the Equality Act of 2010 (Office for Disability Issues, 2011). The United States has the Americans With Disabilities Act (1990) which rules out any discrimination based on a person’s disability.

Although Canada does not have a Disability Discrimination Act per se, it operates under the Federal Accountability Act of 2006 (CWDA, 2011). The Federal Accountability Act does not directly address website accessibility; however, it was extended by government policy revolving around declaring website management roles. The policy separates professionals involved in the development of websites into categories such as developers, graphic designers and content managers. The policy then places legal responsibility for accessibility issues associated with each category. This system relies on a specific staffing structure which causes limitations for small organisations and larger organisations that use a different structure.

Across the European Union (EU) a mixture of disability discrimination laws are in place. The EU states that compliance with the WCAGv2 will be mandatory by 2010 (EIS, 2006). In each of the formerly mentioned countries the WCAGv2 is referenced as the common website accessibility standard. The United States has an additional standard entitled “Section 508”, which makes reference not only to the technical requirements for accessibility but also to the language and communication issues surrounding accessibility. Section 508 will not be considered further here; the focus of this paper is on website compliance with the internationally recognised WCAGv2.

2 BackgroundIn a time where users are pushing for ever more advanced website functionality, websites are becoming rapidly more complicated, and less accessible for those facing difficulties. Milliman (2002) conducted a survey of webmasters, including representatives from many different demographics including; large and small, business to business and business to consumer, not-for-profit and profit-seeking organisations. Over 98% of websites examined in the survey failed the Bobby test (CAST, 2011) for website accessibility and thus, did not comply with US Federal Regulation Section 508 nor the


22

W3C’s WCAGv2 accessibility standard.The results of the Milliman (2002) survey also

indicated that 42% of the survey population did not consider persons with disabilities as part of their target audience. Further, only about 13% of the surveyed population claimed that they had insufficient funds to make their site compliant, theoretically leaving 87 % of surveyed organisations with the funding to create accessible websites but making the choice not to.

2.1 Barriers to web accessLittle research surrounds the effect that disabilities can have in reference to web accessibility. Many and varied conditions can affect website accessibility including, but not limited to; cognitive impairment, motor skill impairment, sensory impairments such as hearing and vision impairment, processing disorders and learning disorders such as dyslexia.

Vassallo (2003) notes a number of common interface design flaws that can have an effect on access for individuals with disabilities including; small fonts, poor contrast backgrounds (either too low or too high), large blocks of text, cluttered pages, animated images or blinking/moving text, automated page or form redirects, excessive use of capitals or italics, fully justified text (resulting in uneven spacing between words); and wordy and confusing use of English.

Assistive technologies designed to boost web accessibility cater to the varied needs of different individuals and different disabilities. Commonly used assistive technologies include; high contrast monitors, low-resolution (high-magnification) monitors, digital Braille devices, screen readers, voice recognition / digital transcribing, low sensitivity input devices, joysticks, track balls and alternate keyboards (manipulated by head movements)

The technologies and development work behind these assistive technologies have a significant effect on how developers and designers create websites. Huang (2002) notes that rules such as using the “ALT” tag when displaying an image, or avoiding calling a button or link “click here”, are considered best practice as for someone using a screen reader, “click here” does not portray context.

2.2 Methods of evaluating accessibilityMethods of evaluating website accessibility broadly fall into three categories; automatic validation / tools, manual evaluation against the WCAGv2 specification, and accessibility testing via a group of test users.

2.2.1 Automatic validation/toolsAutomatic validation is by far the simplest and most costeffective method for evaluating accessibility. Most online automatic validation tools systematically crawl through websites measuring compliance through examining the code structure of the website.

Although this is a very easily implemented and costeffective strategy, sites that function as applications rather than the more traditional information websites rate poorly. Websites such as Facebook initially open with a login screen asking for a username and password - a

common occurrence in restricted web applications. The crawler would not have links to bypass this page, therefore rendering the site non-compliant.

A solution to this may be to temporarily disable web security during the testing and development phases. Another may be to use a client-based validator which will follow a user’s navigation path through the website; however, this process has limitations, as only pages visited by the user will be checked.

The use of automated checkers appears to be an effective method of detecting syntactic errors in coding. Killam and Holland (2001) note that in traditional information-based websites automatic checkers are less likely to miss accessibility issues. However, automated checkers do not detect or warn users about formatting, cascading style sheet, display or colour errors (Rowan et al, 2000). Automated checkers are also known to have difficulties in evaluating non-English websites (Cooper & Rejmer, 2001). None of the currently available tools check reading order, or how the website will be interpreted by a screen reader (Cooper & Rejmer, 2001).

2.2.2 Manual Evaluation against the WCAGv2The method of manually checking a website against WCAGc2 criteria, although more cost-effective than user testing, requires more labour, in terms of training and implementation, than the use of automated validation. Familiarity with the WCAGv2 and consistency are vital for a person undertaking the role of evaluator as this approach runs the risk of potentially being very subjective. Manual evaluation is likely to identify a wider range of accessibility issues (Lang, 2003), however, it is less likely to highlight usability issues which may prevent users, with or without a disability, from completing their task (Killam & Holland, 2001). It has also been noted that manually checking large number of pages is not practical and can lead to the overlooking of pages or inconsistent criteria (Rowan et al, 2000).

2.2.3 User-Based TestingUser-based testing is generally regarded as the most accurate method of accessibility testing. Although authors debate the specific methodology involved in user testing, the general concept remains consistent: a test group of users systematically work through the website, testing usability and accessibility from their point of view.

As with all testing methods, user-based testing has its limitations, and users are likely to return accessibility issues specifically related to their particular needs. A group of test subjects with vision impairment are likely to focus their feedback around text size and colour contrast whereas a test group consisting of people who have dyslexia are more likely to focus on text content, writing styles and menu systems as possible issues (LaPlant et al, 2001).

Regardless of the nature of test groups, user-based testing is likely to be the most expensive of the three methods and also poses the added challenge of finding a large enough group of diverse, experienced testers to challenge the accessibility of the website. However, this method is an effective way to uncover usability issues that affect all users, both with and without a disability


23

(Killam & Holland, 2001).

2.3 Limitations of the WCAGv2Colwell & Petrie (1999) investigated the accessibility of web pages developed under the WCAG guidelines. They compared the different web pages in relation to different browsers and screen readers using a test group of 15 users with vision impairment. The results showed that even though the web pages were WCAG-compliant, some major usability issues still persisted. Six out of the 15 users could not view the “ALT” text that was available (this appeared to be linked to the users’ / test subjects’ experience). Other results showed that some deviations away from the WCAG guidelines actually improved accessibility.

Colwell & Petrie (1999) remarked that companies following the WCAG guidelines could develop a false sense of security as simply passing the WCAG criteria does not necessarily make a website accessible. As most western countries reference the WCAGv2 as the recognised legal document for website accessibility, this is cause for concern. Rowan et al (2000) affirm that although guidelines provide a good starting point, common sense and user testing are the most effective way to carry out accessible development. Unreflective adherence to the WCAGv2 or any other guidelines, especially in the dynamic and creative field of website development, will lead to restricted and inferior products (Sloan et al. 2006).

3 MethodologyAssessment criteria were selected to test the compliance with DDA standards of the websites of the top 20 Australian companies and the 20 Australian Federal Government portfolios. Websites were examined manually in order to assess compliance with each of the criteria.

3.1 Selection of websitesThe AHREOC (1999) states that, “Equal access for

people with a disability in this area is required by the DDA where it can reasonably be provided…” By choosing the top 20 Australian companies, financial hardship as a defence for noncompliance can be eliminated.

Companies can be ranked in a variety of ways including; company wealth (assets), number of employees, turn over, net profit, physical land etc. The Australian Stock Exchange ranks the top 200 traded publically listed companies, however, this measurement has limited validity as it is a measure of stock trading and neglects other influencing factors of size or wealth. Therefore, for the purpose of this paper the top 20 Australian companies will be derived from the Thomson Financials world scope database. The Thomson list is derived from roughly 1,800 publicly traded Australian companies. Companies are ranked into four equally weighted lists of; biggest sales, profit, assets and market value. Companies receive points based on their rank within each category. If a company does not appear in any of the four lists, they will receive no points for that category. Rank positions are then summed to create the

final top 20 companies list.

3.2 Assessment criteriaThe WCAGv2 covers a wide range of requirements and recommendations for making website content more accessible. These guidelines cover coding, colours, size, accessibility, media, error correction and business logic. Following the WCAGv2 guidelines will ensure content is accessible to a wider range of users including those with disabilities. The guidelines specifically target vision impairment, hearing impairment, learning disabilities, cognitive limitations, physical disabilities, speech disabilities, photosensitivity and combinations of these conditions. WCAGv2 criteria have been written as non- technology-specific testable statements allowing for application across various mediums.

For the purpose of this paper, twelve criteria have been selected directly from the WCAGv2 based on experience and observations of web development industry practices. Although the chosen criteria are based on the WCAGv2, by no means are they a complete substitution for the WCAGv2. This means it is possible for a website to pass all twelve criteria used in this paper and still not meet the WCAGv2 standard. However, if a website fails any one of the chosen criteria, the website has failed to meet the WCAGv2 standard.

3.2.1 Criterion 1 – W3C validation serviceMost web documents are written using a markup language such as HTML or XHTML. These markup languages are defined in the technical specifications covered in the International Standard ISO/IEC 15445-HyperText Markup Language and the International Standard ISO 8879-Standard Generalized Markup Language. These technical specifications include detailed rules regarding syntax or grammar in relation to specific elements within a document. These rules include which elements can be contained inside which elements as well as what types of data can be contained inside a specific element.

The W3C markup validation service (http://validator.w3.org/ ) is a free web application produced by the World Wide Web Consortium (W3C) which allows the user to enter the URL of a publicly accessible website and check whether the website meets the technical specification of the specific markup language. The W3C validator can process documents written in most markup languages including HTML 1.0 – 4.01, XHTML 1.0 and 1.1, MathML, SMIL, SVG 1.0 and 1.1. In addition to being a syntax error detector the W3C validator will check some (but not all) of the accessibility specifications specified by the WCAGv2.

A website will be deemed to have failed on Criterion 1 if the website is found to have any errors after being passed through the W3C validator.

3.2.2 Criterion 2 – Images without “ALT” tagsSection 1.4.5 of the WCAGv2 specifies that websites should not contain images of text, the exception being when the images can be visually customized to the user’s requirements. This one section of the WCAGv2 alone results in non-compliance from nearly every website


24

http://validator.w3.org/

(including W3C’s own website). For the purpose of this paper, a website will fail

against Criterion 2 if it does not contain “ALT” or alternative text for every image containing text.

3.2.3 Criterion 3 – Minimum colour contrastSection 1.4.3 of the WCAGv2 specifies that the text on a website should have a contrast ratio of at least 4.5:1 for AA standard and 7:1 for AAA standard. The only exception to this is logos and trademarks, in which no minimum colour contrast applies and large text (18pt and above) in which a lower contrast ratio of at least 3:1 is required.

Colour brightness formula: ((Red value X 299) + (Green value X 587) + (Blue

value X 114)) / 1000Colour difference formula: (max (Red 1, Red 2) - min (Red 1, Red 2)) + (max

(Green 1, Green 2) - min (Green 1, Green 2)) + (max (Blue 1, Blue 2) - min (Blue 1, Blue 2))

For a website to pass criterion 3, the text colour of all text on the home page and the “about us” page must reach at least AA standard by having a brightness difference greater than 125 and a colour difference greater than 500.

Source: http://snook.ca/technical/colour_contrast/colour.html

3.2.4 Criterion 4 – Text size increaseSection 1.4.4 of the WCAGv2 specifies that, with the exception of captions and images of text, the user should be able to increase the size of the text by 200 percent without the loss of content or functionality. For the purpose of this test the definition of “loss of content or functionality” will be defined as: the text should be clear to read by not overflowing over another element, background image or other text.

A website will fail on Criterion 4 if, by increasing the text size by 200 percent, there is a loss of content or functionality or if the website has restricted the user from adjusting the text size by using specified font sizes in their style sheets.

3.2.5 Criterion 5 – Flash / PDFs as contentThe document “Techniques for WCAGv2: Techniques and Failures for Web Content Accessibility Guidelines 2.0” specifies the accessibility best practices for Flash and PDF development. Included in the specification is the requirement that any Flash and PDF text content needs to be accessible for assistive technologies, including but not limited to: Job Access With Speech (JAWS) (4.5 and newer), Window-Eyes (4.2 and newer), Non Visual Desktop Access (NVDA), ZoomText (8 and newer).

For the purpose of this paper, a website will fail against criterion 5 if JAWS 4.5 cannot read any text contained in Flash or PDF documents. In the event that the website does not contain any Flash or PDF documents then the website will be considered to have passed Criterion 5.

3.2.6 Criterion 6 – BreadcrumbsBreadcrumbs are a series of hyperlinks showing the user’s position and history within the website. Section 2.4.5 of the WCAGv2 specifies that there must be more

than one way to locate a page within a website, with the exception of pages which are the result of a process. Section 2.4.8 of the WCAGv2 states that the user should be able to easily identify where he/ she is in the website.

For the purpose of this paper a website will be regarded as failing on criterion 6 if it does not display a breadcrumb trail for pages deeper than two levels in the navigation tree.

3.2.7 Criterion 7 – Time dependent menusPrinciple 2 of the WCAGv2 states that the website’s user interface components and navigation must be operable. Specifically, this paper is assessing the functionality of dynamic menus. Many dynamic menus are built using a timer. Hence, if the user is a slow reader or is unable to move the mouse quickly, timed menus can make a website unusable. To test this criterion, dynamic menus will be navigated by moving the mouse pointer at a slow, uniform speed over the menu. To pass this criterion, a website’s dynamic menus must be operable at slow speed. The website will automatically pass against criterion 8 in the event that the website does not have any dynamic menus.

3.2.8 Criterion 8 – URL error detectionMissing pages or 404 errors can be caused by users typing in a webpage URL incorrectly or on occasion poor web content or link management. Section 3.3.3 of the WCAGv2 states that any user input error should be met with a correct usage suggestion. In the situation where a user misspells the “Contact Us” URL, the website should redirect the user to a “Page not found” page which will, in turn, suggest where the user will find the “Contact Us” page. This criterion will be tested through manual attempts to access the “Contact Us” and “About Us” pages by misspelling the page URL by one character.

To pass against criterion 8 a website will need to either catch the error and provide a URL suggestion, or include a site map in a “Page not found” page. If the website does not catch the 404 error or provide a “page not found” page it will fail against criterion 8.

3.2.9 Criterion 9 – Page titlesSection 2.4.2 of the WCAGv2 states that all pages must have meaningful page titles that describe the topic or purpose of the page. This criterion will be tested by navigating through the website and observing wether the page title changes from page to page. A website will fail on criterion 9 if the page titles do not change or if the developer has not specified a page title.

3.2.10 Criterion 10 – Use of PDF / Flash formsPDF and Flash solutions for data entry forms create useability issues for people with text readers or users who require magnification. A website will fail on criterion 10 if the forms used in the “Search” or “Contact Us” functionality are found to be built using flash or PDF technology.

A website will pass on this criterion if there are no forms present on the website or if the forms have been built using traditional HTML.


25

3.2.11 Criterion 11 - Form sample answersSection 3.3.5 of the WCAGv2 states that user input forms must contain sample answers, assuming the sample answers do not jeopardize the security or validity of the input / form. To pass against criterion 11, websites will need to have sample answers in the “Contact Us” forms and search forms. In the event that neither form is present the website automatically passes criterion 11.

3.2.12 Criterion 12 - Form validation and bypass

Section 3.3.6 of the WCAGv2 states that all forms must provide error identification / validation. This type of validation is designed to, for example, stop a user from accidentally inputting a letter in a telephone number field, or to warn a user that he/she has entered an incorrect piece of data or omitted data. The WCAGv2 also states in section 2.4.1 that the user should be able to bypass any blocks. An example of a failure to provide a bypass is a website that features a compulsory home telephone number field.

This criterion will be tested in the context of a form on the website. Data that does not correspond to the prescribed fields will be entered and the website will be expected to provide an error message. If the website displays an error message, a bypass route will be sought.

A website will pass against criterion 12 if the form has validation and a bypass mechanism, or if the website does not contain a form. For the purposes of this paper, the authors will recognise the organisation’s contact details as a bypass mechanism.

4 ResultsThe results are displayed in the tables below and right. Both tables show criteria one through twelve along the top and indicate a pass or a fail of each criteria with a ‘tick’ or a ‘cross’ respectively.

Table 1 shows the largest 20 private Australian companies as derived from the Thomson Financials world scope database represented as A – T.

Table 2 shows the 20 organisations which make up the Australian federal portfolio represented as A – T.

1 2 3 4 5 6 7 8 9 10 11 12

A û û û ü ü ü û ü ü û û

B û ü û ü ü ü ü ü ü û û

C û ü ü ü ü ü ü ü ü û û

D ü ü ü ü ü ü ü ü û û û

E û ü û ü û ü ü ü ü û û

F ü ü ü ü ü ü ü ü ü ü û

G û ü ü ü ü ü ü ü ü û û

H û û ü ü ü ü ü ü ü û û

I ü ü ü û ü û ü ü ü û ü

J û û û û ü ü ü ü ü û û

K û û û ü ü ü û û ü û û

L û û û ü ü ü ü ü ü û û

M ü ü ü ü ü û ü ü ü û û

N û ü û ü ü ü ü ü ü û û

O û ü û ü ü ü ü ü ü û û

P û ü û ü û ü ü ü ü û û

Q û û ü ü ü ü ü ü ü û û

R û û û ü û ü ü ü ü û û

S û û û ü ü û ü ü ü û û

T û û û ü ü û ü ü ü û û

Table 1: 20 largest Australian private companies

1 2 3 4 5 6 7 8 9 10 11 12

A ü ü ü ü û ü û ü ü ü ü

B ü ü ü ü ü ü ü ü ü ü ü

C û ü ü ü ü ü û ü ü ü ü

D û ü ü ü ü ü ü ü ü ü ü

E û û ü ü ü ü ü ü ü ü ü

F û ü ü ü ü ü û ü ü ü ü

G û ü û ü ü ü ü ü ü ü ü

H û ü ü ü ü ü ü ü ü ü ü

I û û ü ü ü ü ü ü ü ü ü

J û û ü ü ü ü ü ü ü ü ü

K û ü ü ü ü ü ü ü ü ü ü

L û û û ü û ü ü ü ü ü ü

M û û ü ü û ü ü ü ü ü ü

N û ü û ü ü ü ü ü ü ü ü

O ü û û ü ü û ü ü ü ü ü

P ü û ü ü ü ü ü ü ü ü ü

Q û û û ü û ü ü ü ü ü ü

R û ü ü ü ü ü ü ü ü û ü

S û ü ü ü û ü û ü ü ü ü

T û ü û ü ü ü û û ü ü ü

Table 2: Australian federal government portfolio


26

5 DiscussionWhen reviewing the results it is important to remember that the criteria are not comparable, and that individuals with different needs will place different importance on certain criteria. A user who relies on a screen reader will regard criterion one (W3C checker) as of higher importance than colour contrast (criterion 3), however this may not be the same for another user.

The Australian government is in the process of enforcing the WCAGv2, and this is evident with three government portfolios achieving a pass in criterion one. A number of websites, both government and private industry, failed on criterion one with only one or two errors. It is possible that when these websites were developed they did meet criterion one (W3C checker) but through normal content editing and content changing, minor mistakes were made, resulting in the website failing to meet criterion one. Content Management Systems (CMS) have been largely blamed for this; however, it would be unfair to say that this is the CMS’s fault as by and large they are designed to the WCAGv2 specification. A more likely reason for the error is that a content editor has made a process mistake. An example of this would be adding an image without including the ‘ALT’ text: this caused at least two websites to fail criterion one.

This is an issue which can be easily addressed with adequate staff training. Although the authors take issue with the specific level of government dictation and specification, the Canadian system of specifying website management roles (developer, content manager, designer) and assigning legal responsibility has merit. Companies and government departments would benefit from assigning specific individuals the responsibility of maintaining sections of website accessibility.

Criteria 10, 11 and 12 are based around accessibility of web forms. There is no legal requirement for a corporate or a government website to include a “contact us” form and it was noted that the government portfolio websites were less likely to include them. This is a limitation of the methodology used in this paper in that the criteria used rewards websites with less functionality. Because of this it is in an organisation’s interest to limit the use of technically “clever” designs as this increases the likelihood of creating accessibility issues. Taking the example in criteria 11 and 12 surrounding the provision of sample answers to form questions, by providing the example of ‘Joe Bloggs’ it could be argued that the user may be inclined to copy the example rather than entering in their own data, thus raising questions around the validity of the form. The WCAGv2 also instructs that a “bypass” capability should be available for required fields. If taken literally this means that if when asked to confirm a password by typing it a second time it is typed incorrectly, users should be able to bypass the password confirm step. This is an example where following the accessibility guidelines too closely will result in an inaccessible website.

The results show that there is a general trend for federal government websites to be more accessible than websites in private enterprise. Partly this can be explained by federal government’s unwillingness to use “contact

us” forms and technically challenging designs. Website accessibility is a complicated problem and is specific to individual users, therefore, as website content keeps changing it will be near impossible to make a completely accessible website. That being said, it is the authors’ belief that there is no excuse for making a website which is syntactically flawed, and that passing the W3C automated checker should become an industry standard.

6 ReferencesABS (1997): Australian Bureau of statistics (ABS)

http://www.abs.gov.au/ausstats/[email protected]/mediareleasesbytitle/F8B88F8E13D8FC64CA2568A900136245?OpenDocumeent Accessed 2 Aug 2011

AHROCC: Australian Human Rights Commission (2010). World Wide Web Access: Disability Discrimination Act Advisory Noteshttp://www.hreoc.gov.au/disability_rights/standards/www_3/www_3.html Accessed 4 Aug 2011.

AwD: Americans with Disabilities act 1990 http://www.ada.gov/pubs/adastatute08.htm Accessed 6 Aug 2011.

CAST: Centre for Applied Special Technology – Bobby http://www.cast.org/learningtools/Bobby/index.html Accessed 8th Aug 2011.

Colwell, C. & Petrie, H. (1999). Evaluation of Guidelines for Designing Accessible Web Content. Position paper INTERACT’99 Workshop: Making Designers Aware of Existing Guidelines for Accessibility.

Cooper, M. & Rejmer, P. (2001): Case study: localization of an accessibility evaluation [Electronic version. ACM CHI2001 Conference on Human Factors in Computer Systems.

CWDA: Canadians With Disabilities Act http://www.canadianswithdisabilitiesact.com/ Accessed 6 Aug 2011.

EIS: Europe's Information Society (EIS) - Riga Ministerial Conference (2006).http://ec.europa.eu/information_society/activities/einclusion/events/riga_2006/index_en.htm Accessed 11 Aug 2011.

FCA: Federal Court of Australia (2000). Maguire v Sydney Organising Committee for the Olympic Gameshttp://www.austlii.edu.au/au/cases/cth/federal_ct/2000/1112.html Accessed 7 Aug 2011.

Huang, C. J. (2002): Usability of E-Government Web-Sites for People with Disabilities. 36th Hawaii International Conference on System Sciences. Shih-Hsin University.

ILI: Independent Living Institute - No more exclusion international day of persons with disabilitieshttp://www.independentliving.org/docs6/idf-world-dev.html Accessed 8 Aug 2011.

Killam, B. & Holland, B. (2001): Position Paper on the Suitability to Task of Automated Utilities for Testing Web Accessibility Compliance. The Usability SIG Newsletter: Usability Interface Accessibility and Usability: Partners in Effective Design. April 2003.


27

http://www.independentliving.org/docs6/idf-world-dev.html%20Accessed%208%20Aug%202011



http://www.austlii.edu.au/au/cases/cth/federal_ct/2000/1112.html%20Accessed%207%20Aug%202011



http://ec.europa.eu/information_society/activities/einclusion/events/riga_2006/index_en.htm

http://ec.europa.eu/information_society/activities/einclusion/events/riga_2006/index_en.htm

http://www.canadianswithdisabilitiesact.com/

http://www.cast.org/learningtools/Bobby/index.html

http://www.ada.gov/pubs/adastatute08.htm

http://www.hreoc.gov.au/disability_rights/standards/www_3/www_3.html

http://www.hreoc.gov.au/disability_rights/standards/www_3/www_3.html

http://www.abs.gov.au/ausstats/[email protected]/mediareleasesbytitle/F8B88F8E13D8FC64CA2568A900136245?OpenDocument



LaPlant, W.P., Laskowski, S.J. & Stimson, M.J. (2001): Report on UPA Workshop 6 Exploring Measurement and Evaluation Methods for Accessibility. Workshop at Tenth Tania Lang Peak Usability © 14 Annual Usability Professionals’ Association Conference.

Loiacono, E. and McCoy, S (2004). Web site accessibility: an online sector analysis. Information Technology & People, Vol. 17 No. 1, pp. 87-101. Emerald Group Publishing Limited.

ODI: Office for Disability Issues HM Government - Equality Act 2010 and the Disability Discrimination Act 1995 http://odi.dwp.gov.uk/disabled-people-and-legislation/equality-act-2010-and-dda-1995.php Accessed 4 Aug 2011.

RNIB: Royal National Institute for the Blindhttp://www.rnib.org.uk/ Accessed 2 Aug 2011.Rowan, M., Gregor, P., Sloan, D. & Booth, P. (2000).

Evaluating web resources for disability access The Fourth International ACM Conference on Assistive Technologies (ASSETS 2000). Virginia: ACM.

SAG: South Australian Government - Application and internet (2011) http://www.sa.gov.au/government/entity/1670/About+us+-+Office+of+the+Chief+Information+Officer/What+we+do/Policies,+standards+and+guidelines/Applications+and+internet

Accessed 6 Aug 2011.Sloan David, Heath Andy, Hamilton Fraser, Kelly Brain,

Pertrie Helen, Phipps Lawrie (2006). Contextual Web Accessibility – Maximizing the benefit of Accessibility Guidelines

Spindle, T. (2002). Accessibility of Web Pages for Mid-Sized College and University Libraries. Librarian Publications University Libraries, Rodger Williams University.

Tania Lang, (2003). Comparing website accessibility evaluation methods and learnings from usability evaluation methods.

Tanner, Lindsay (2010). Government Releases Website Accessibility National Transition Strategyhttp://www.financeminister.gov.au/archive/media/2010/mr_372010_joint.html/mr_052010_joint.html Accessed 2 Aug 2011.

Thomson Reuters (2011). Worldscope fundamentalshttp://thomsonreuters.com/products_services/financial/financial_products/a-z/worldscope_fundamentals/ Accessed 3 Aug 2011.

VGAT: Victorian Government Accessibility Toolkit, (2011). eServices Unit, Information Victoria, Department of Business and Innovation. Ver 3.1.1

WCAGv2: W3C. Web Content Accessibility Guidelines (WCAG) 2.0 http://www.w3.org/TR/WCAG20/ Accessed 2 Aug 2011


28

http://www.w3.org/TR/WCAG20/

http://www.w3.org/TR/WCAG20/

http://thomsonreuters.com/products_services/financial/financial_products/a-z/worldscope_fundamentals/

http://thomsonreuters.com/products_services/financial/financial_products/a-z/worldscope_fundamentals/

http://www.financeminister.gov.au/archive/media/2010/mr_372010_joint.html/mr_052010_joint.html

http://www.financeminister.gov.au/archive/media/2010/mr_372010_joint.html/mr_052010_joint.html

http://www.sa.gov.au/government/entity/1670/About+us+-+Office+of+the+Chief+Information+Officer/What+we+do/Policies,+standards+and+guidelines/Applications+and+internet



http://www.rnib.org.uk/

http://odi.dwp.gov.uk/disabled-people-and-legislation/equality-act-2010-and-dda-1995.php

http://odi.dwp.gov.uk/disabled-people-and-legislation/equality-act-2010-and-dda-1995.php

Merging Tangible Buttons and Spatial Augmented Reality to SupportUbiquitous Prototype Designs

Tim M. Simon1 Ross T. Smith1 Bruce Thomas1 Stewart Von Itzstein1 Mark Smith2

Joonsuk Park3 Jun Park3

1 School of Information Computer ScienceUniversity of South Australia,

PO Box 2471, Adelaide, South Australia 5001,Email: [email protected],

[email protected],[email protected],[email protected]

2 School of Information Computer ScienceRoyal Institute of Technology,

Kungl Tekniska Hgskolan, SE-100 44 Stockholm,Email: [email protected]

3 Department of Computer ScienceHongik University,

Seoul Korea,Email: [email protected],[email protected]

Abstract

The industrial design prototyping process has previouslyshown promising enhancements using Spatial AugmentedReality to increase the fidelity of concept visualizations.This paper explores further improvements to the processby incorporating tangible buttons to allow dynamicallypositioned controls to be employed by the designer. Thetangible buttons are equipped with RFID tags that areread by a wearable glove sensor system to emulate but-ton activation for simulating prototype design function-ality. We present a new environmental setup to supportthe low cost development of an active user interface thatis not restricted to the two-dimensional surface of a tra-ditional computer display. The design of our system hasbeen guided by the requirements of industrial designersand an expert review of the system was conducted to iden-tify its usefulness and usability aspects. Additionally, thequantitative performance evaluation of the RFID tags in-dicated that the concept development using our system tosupport a simulated user interface functionality is an im-provement to the design process.

1 Introduction

This paper describes a methodology that provides design-ers with an interactive physical user interface that is em-ployed for mock-up creation. The initial concept of usingRFID tags for dynamically positionable buttons was pre-sented by Thomas et al. in 2011 [21]. The novel contribu-tion of this paper is the development and evaluation of aninteractive design system employing Spatial AugmentedReality (SAR) for appearance presentation, tangible but-tons for enhanced user interface fidelity, vision tracking to

Copyright c©2012, Australian Computer Society, Inc. This paper ap-peared at the 13th Australasian User Interface Conference (AUIC 2012),Melbourne, Australia, January-February 2012. Conferences in Researchand Practice in Information Technology (CRPIT), Vol. 126, HaifengShen and Ross Smith, Ed. Reproduction for academic, not-for-profitpurposes permitted provided this text is included.

capture placement of user interface controls and our wear-able RFID enhanced glove with fingertip read resolutionto support emulated button presses. Our methodology al-lows designers to dynamically refine a design by rearrang-ing the physical components of a user interface, virtuallychange the appearance of the tangible user interface, andemulate user interface functionality. This approach allowsthe designer to instantiate their ideas in a haptically richform as early as possible in the design process. Figure 1shows a non-planar white surface with a blue projectedSAR appearance, movable tangible buttons and the wear-able RFID glove in use.

Initial explorations into combining Spatial AugmentedReality into the industrial design process have shownpromising results by extending currently employed de-sign methodologies [15, 22]. A common SAR prototyp-ing practice employs an approximate physical model thatis augmented with perceptively correct projected digitalimages to enhance the appearance. The projected digitalimages provide fine-grain details of user interfaces suchas virtual buttons, dials, annotations and finishing effects.A significant benefit to using SAR over a CAD softwaresystem is that the physical models of SAR systems providesimple passive haptic feedback, allowing the user to touch

Figure 1: RFID glove used with SAR projected domemock-up.


29

the computer generated mock-up while it is being created.Also, unlike pure physical mock-ups that are painted forpresenting finishes, the SAR appearance can be modifiedinstantly by modifying the projected image. This is par-ticularly powerful for industrial designers since they canmaintain the hands on nature of physical prototyping andalso gain benefits, such as unlimited undo operations, thatcomputer systems provide.

In the above description there are a number of limita-tions that our collaborating industrial design experts haveidentified that curb the use of SAR systems for design theof mock-ups. For example, a stove appliance may initiallybe modelled using a rectangular box as the SAR substrate,but design mock-ups would at some stage require physi-cal buttons and dials for the client to feel and experimentwith. A drawback to previous SAR design systems is thatthe fidelity of the haptics felt by the user is limited to fixedsurfaces and shapes. For example, in a previous study thatvalidated the use of virtual SAR controls for design, Porteret al. reported that participants collectively perceived pro-jected virtual buttons as less realistic than physical buttons[15]. Participants of the experiment commonly identifiedthe need for improved tangible feedback so that they knewthey had actually touched and activated a button. This in-dicates designers of physical interfaces would benefit frommerging the configurability of a purely virtual design withthe tangibility of a physical design tool.

This paper reports on our work on improving the fi-delity of the haptics in SAR systems by using a wearableRFID technology to combine functional tangible buttonswith projected SAR mock-ups. By using unattached in-dividual tangible buttons, the designer maintains the abil-ity to re-configure aspects of the mock-up design but un-like previous implementations they can physically pickup the tangible buttons that compose the user interfaceand re-configure them until the desired layout is reached.While RFID readers have previously been embedded ingloves [9, 14, 18, 20], the novelty of this use of a glove-based RFID reader is its ability to emulate tangible buttonpresses.

We envision mock-up designs may use hundreds of po-tential tangible buttons of different sizes and shapes, forexample a mock-up audio equalizer board. Using this newtechnology approach, this is easily achievable and costeffective using RFID tags. For each unique design thatis created using this methodology, only the physical sub-strate (a white blank), the textures used for the appearanceand the logic behind interactive controls need to be devel-oped. Application software optionally may be constructedwith our API to simulate the design’s functionality. Thesystem reported in this paper provides the following fea-tures that address the challenges outlined above:

1. Employing tangible buttons for improved haptic fi-delity.

2. Provides easy to move physical controls to the de-signer for mock-up layout (3DOF with the currentsystem).

3. Presents an easy to change SAR appearance pro-jected on the buttons.

4. RFID button presses used by the SAR design systemto simulate user interface functionality.

5. Inexpensive and easy to create tangible buttons.

The paper starts with a description of the critical re-lated research and technologies. Following this, the con-cept of employing the technologies of SAR and RFIDfor dynamic tangible button interactions for prototyping ispresented. The design of the wearable RFID glove systemis then presented. Proceeding this, the system implemen-tation details. The remainder of the paper presents a per-formance evaluation of the finger tip read resolution andan expert review evaluation.

2 Related Work

This section describes the four areas of supporting workfor this system including; industrial design methodolo-gies, augmented reality, physical user interface controlprototyping and RFID technologies.

2.1 Design

Pugh’s total design is an example of a readily employedmethodology employed for the creation of prototypes.This approach consists of six fundamental design and de-velopment steps; market (user need), product design spec-ification, conceptual design, detail design, manufacture,and sales [16]. The SAR design investigations presentedin this paper focus on providing new methodologies forthe concept and detail phases. In the concept phase, de-signers brainstorm approaches, sketch ideas and form po-tential designs. A selection process is then performed thatrules out many potential designs. Following this, mock-ups are created for the selected designs and are shown tothe customer.

2.2 Augmented Reality

Augmented Reality (AR) combines a real-world view withcomputer generated graphics registered to the environ-ment, AR commonly uses head mounted or hand-held dis-plays to present the computer generated information to theuser. One limitation of these display techniques is thatthey do not provide the users with any haptic feedback forthe computer generated information. Spatial AugmentedReality [4] is a novel form of AR that uses commercial offthe shelf projectors to change the appearance of everydayobjects. Since the physical objects are the display surface,the user experiences tactile feedback that provides a moreimmersive and stimulating experience. The SAR displaytechnology presented in this paper is based on the ShaderLamps [17] technology. An extension of this technique,Interactive Shader Lamps [3] enables a user to digitallypaint graphics onto a physical object.

Previous research has explored the use of SAR to en-hance the industrial design processes. For example, theWARP [22] system projects onto foam models to allowdesigners to explore different material properties and fin-ishes for a design prototype. Augmented Foam Sculpt-ing [13] allows designers to simultaneously create 3D vir-tual and physical models by sculpting foam with a trackedhot-wire cutter. The HYPERREAL design system [11]employs SAR to visualize virtual deformations of the sur-faces of physical objects. DisplayObjects [1] is a systemthat allows designers to project user interface controls on aprototype, this work shows the potential benefits of usingSAR to improve the ability to iteratively design the visualaspects of the interfaces.

2.3 Physical User Interface Control Prototyping

There are a number of systems that have provided dynam-ically configurable physical environments. Pushpin Com-puting [5] provides wireless input modules that are pushedinto a foam substrate, with power pins connecting to con-ductive planes beneath the foam. This makes the place-ment of the nodes very simple, but their size and shapecannot be dynamically changed. Additionally, a flat planefor the foam substrate is required, which limits their use oncomplex surfaces. Phidgets [10] provide a variety of elec-tronic input controls and sensor modules that can be com-bined to create complex physical interfaces. The CalderToolkit [12] builds on this concept with wireless inputmodules that can be attached to product design mock-ups.


30

While toolkits such as these make high fidelity prototyp-ing faster than with entirely custom electronics, they arestill inflexible compared to virtual prototyping. For exam-ple, for each module that requires a new shape (i.e. a newbutton form factor) another physical node and new elec-tronics must be constructed before it can be added to thesystem. Avrahami et al. developed a system that employedRFID tags to provide interactive controls for industrial de-sign prototypes. Their system did not use a glove basedreader, rather an antenna was placed on a table in a fixedposition and used in conjunction with switched RFID tagsto create button events. One limitation of this approach isthe working volume of the system is limited to the rangeof the antenna [2].

2.4 RFID Technologies

RFID readers have been incorporated into gloves and usedto study application spaces spanning business, education,entertainment and medicine. One of the earliest examplesof a glove mounted RFID reader system was developedby Schmidt et al. It was used to associate objects with ma-chine generated events when handled including the abilityto invoke components of an enterprise resource manage-ment system used for business logistics [18]. Muguira etal. developed a similar system intended for conductingwarehouse inventories and activity recognition [14]. An-other example was developed by Tanenbaum et al. [20]where objects that are touched invoke further interactionbetween the user and the object. These examples all useglove mounted 125 Khz RFID readers where the aim is torecognize an entire object that is being touched or held.The spatial reading resolution of the RFID system is theentire hand and not a single finger. Systems have alsobeen designed that operate at 13.56 Mhz, a good exampleis the iGlove [9]. This device was described in two ver-sions with the first version having the antenna on the palmof the glove, and was also used for identifying objects heldin the hand. The second version was for medical applica-tions where there was a need to know what was being heldin the fingers. This version is notable for having an an-tenna implemented with conductive paint in the fingertipof a latex glove, and the RFID reader was also moved tothe user’s wrist. Although this solution provided sensingat fingertip resolution, it was reported to have poor dura-bility.

3 Dynamic Tangible Button Interactions

As previously discussed, the interactive design systempresented consists of four major technologies, a SAR pro-totyping system [15], RFID enhanced tangible buttons,a purpose built wearable glove with an embedded RFIDreader, and a computer vision tracking system to deter-mine the tangible button’s position. The dynamic natureof the tangible buttons is their support for the designerto change a button’s appearance and position in a design.This section describes the process a designer would takewhen developing design prototypes with tangible buttoninteractions.

The tangible buttons we present in this paper addresstwo basic requirements, haptic feedback and dynamic con-figuration. They allow the designer to physically moveparts of the user interface and re-position them to obtainan optimal layout. The tangible buttons are a neutral colorto allow SAR images to be projected onto them. Considerthe example of designing a calculator where the designerwould like to compare different button layouts and spac-ings. A predefined set of colors and textures of the tan-gible buttons can be altered via an interface to the SARsystem. The designer may iteratively change appearancesand placements of the tangible buttons. The movement of

a tangible button is captured by our computer vision sys-tem, and the texture is projected onto the tangible buttonin the new location.

The functionality of the tangible buttons is supportedthrough an embedded RFID tag in each button and theRFID reader embedded in a wearable glove with a fin-gertip antenna. The user interface consisting of the cal-culator buttons and display can all be made functional.This approach made it possible to avoid using traditionalelectronics embedded in the physical buttons. Instead theRFID tags provide a generic solution and do not requireany wires or a power source to be used. The tangiblebuttons are activated by touching the antenna finger ona button, and the RFID reader sends an ID to the simu-lator application. The simulator application may changethe appearance of the buttons and update the display onthe calculator. This process more closely simulates the in-teractions required for using the interface and allows thedesigners to assess usability aspects. Our system allowsthe development of new shapes and sizes of tangible but-tons with technologies readily available to designers suchas CAD software, 3D printers, and RFID tags. Currentlythis is a difficult process with systems such as Phidgets[10], as it requires knowledge of electronics design andconstruction.

3.1 Modes of Operation

The industrial designer uses the system by following threefunctional phases. These are the pre-design phase, theinitialization phase and the design phase. During thesephases the industrial designer will collaborate with clientsto provide specialized design considerations and function-ality to the product concept.

There are four entities required in the pre-design phasebefore the concept design process may start. Firstly aswith any SAR process, a physical substrate to project uponneeds to be constructed. This can be as simple as a whitepiece of paper or as complicated as a wooden framed prop.Secondly, an application is required for coordination be-tween the SAR system and the glove reader system, whichcould be a generic or custom application. Thirdly, the tan-gible buttons with embedded RFID tags is required. Fi-nally a set of virtual 3D graphical models for the differentfinishes of the device and UI controls are constructed.

The second step is the initialization phase. Firstly, theprojector and camera need to be calibrated. Secondly, thecomputer vision system must be informed of which RFIDtag is associated with a particular tangible button. To makethe tracking of the tangible buttons more robust, the but-tons are not uniquely visually identified. The process isto prompt the designer to pick up a tangible button, readthe RFID tag with the glove, and place the button on theprototype device for a camera to start tracking its position.This process is repeated for each tangible button employedin the mock-up. Lastly the 3D virtual models and texturesfor the physical artifacts are loaded into the application.

The final design phase is where the prototype applica-tion displays textures associated with each tangible button,particular textures for the device itself, and purely simu-lated UI controls. An example of a purely simulated UIcontrol is the output display in the calculator example de-scribed in the Scenario Section. During the design phase,the designer may rearrange the physical positions of thetangible buttons one at a time, but they may make as manychanges as desired. The designer can also change the ap-pearance of any tangible button with a pre-loaded textureby sliding the button onto a pre-defined region. The appli-cation developed supports three different textures and dis-plays them in this dedicated area separate from the mock-up design. We call this area the texture-loading pallet. Fig-ure 4 shows tangible buttons being used with SAR projec-


31

tions.The shape of the tangible buttons can also be changed

as desired. To achieve this, a new model is designed usinga CAD system and constructed with a 3D printer. Thedesigner uploads the new tangible button 3D model, newbutton textures, and new textures for the device itself tothe application. The designer repeats the second and thirdphases throughout the design and evaluation processes.

4 Designing a Wearable RFID Glove System

Several characteristics of RFID systems need to be takeninto account in order to successfully implement a tangi-ble button system. Important RFID system characteristicsthat affect design decisions are summarized in the follow-ing list; 1) inductively coupled and operate in the nearfield in order to confine activation to a single button 2)antenna should be deployable on the user’s fingertip, 3)minimal power should be used during communication, 4)read range is established on fingertip contact, and 5) thetag protocol should be simple with low latency.

4.1 Inductive Coupling

For a tangible button system, near field operation is desir-able in order to generate a RFID read event from the usertouching or virtually pressing the tangible button. The rel-atively long range of radiative RFID systems covering upto tens of meters are not appropriate for this comparedto inductively coupled RFID systems that operate over amuch shorter tag to reader distance.

4.2 Deployable Antenna

For use in a tangible button system, a useful reader to tagdistance would range from direct contact up to a few mil-limeters. Most inductively coupled RFID systems can eas-ily cover such a range, so the choice of technology willdepend on cost, power and practical deployment consider-ations. Several frequency ranges below 50 MHz have beenidentified as suitable for inductive RFID systems [8], butnot all of them are internationally approved for use. Thetwo most common carrier frequency ranges in commercialuse are those at 125 KHz and 13.56 MHz.

4.3 Power Consumption

The electrical current required to generate an acceptablemagnetic field strength is directly related to the powerneeds of the system, and indirectly related to cost. In in-ductive RFID systems, reader to tag communication is ac-complished by magnetic field coupling between antennainductors on both the reader and the tag. The value ofan inductor is chosen such to create a resonant circuit onboth the tag and the reader. The coil in the tag is encapsu-lated together with the rest of the tag electronics, but in thecase of the tangible button system, the cylindrical readerantenna coil must be constructed on the fingertip of theglove. A typical value of an inductor used in a practicalresonant circuit at 125 KHz could be 1mH representingdozens or hundreds of turns of wire around a glove fin-gertip, while for a circuit at 13.56 MHz a typical valuecould be 3uH, which is only a few turns of wire arounda finger tip. In this regard, a 13.56 MHz system wouldseem to be preferable to a 125 KHz system because theinductors are physically much smaller. However, this hasan undesirable effect with respect to power requirements.The magnetic field strength produced by a cylindrical coilmeasured at the center of the long axis of the cylinder isgiven by [8]: H = IN/2R. Where H is the magneticfield strength in amperes/meter, I is the current in amps,N is the number of wire turns in the coil and R is the coil

(a) (b)

Figure 2: (a) Top view of tangible button with retro-reflective marker. (b) Glove with RFID reader.

radius. As the field strength is directly proportional to thecoil current and the number of wire turns in the coil, a coilwith more turns results in a power advantage. Because at125 KHz, the number of wire turns in a practical coil isbetween 1 and 2 orders of magnitude more than the num-ber of wire turns used in a 13.56 MHz system, the amountof current required for the same magnetic field strengthis substantially less. From this power viewpoint, the 125KHz system is preferable.

4.4 Read Range

Also important is that the area over which the tag is readshould not be more than approximately the area of a fin-ger tip so that it does not appear that more than one tangi-ble button is being pressed at a time. This can be accom-plished if the antenna coil can be made to closely wraparound the glove fingertip, but still have a suitable num-ber of turns in the coil to reliably generate a usably strongfield. This is possible using a 125 KHz system. In com-paring 125 KHz and 13.56 MHz systems in these regards,both systems can deploy reasonable coil sizes around aglove fingertip, however the 125 KHz system has a poweradvantage as previously discussed.

4.5 Protocol

We are also interested in the latency of the RFID systemselected. In general 125 KHz RFID systems have lessavailable communication bandwidth compared with 13.56MHz systems, and often they are simpler systems usuallydesigned without the ability to generate tag sub-carriers,or to perform anti-collision or other tag addressing pro-tocols. A simple, read only identifier sent by the tag issufficient as long as there are enough unique codes for theintended tangible button system. The absence of complexprotocols and a simple data format are advantages in thisregard, again indicating a preference for using a simple125 KHz system for the tangible button system.

5 Implementation

This section describes the implementation of our tangiblebuttons, RFID glove and computer vision system. We em-ployed a SAR system consisting of two NEC NP200 ceil-ing mounted projectors, one Sony XCD-X710CR camera(with IR filter removed), a workstation computer (AMDAthlon 64 Processor 3200+, 512 MB RAM, Ubuntu10.10) and a white SAR substrate.

5.1 Tangible Buttons

We have constructed a custom tangible button with a retro-reflective marker and RFID tag. The 27mm diameter tan-gible buttons were modeled on a CAD system, and printed


32

using a Dimensions uPrint plus printer. Figure 2(a) showsthe printed tangible marker. The top surface is fitted witha square 1cm x 1cm retro-reflective marker that is used bythe vision system to identify the location of the tangiblebutton. The underside of the tangible button that is fittedwith a 15mm in diameter circular RFID tag allowing eachbutton to be identified with a unique identification number.

5.2 Wearable Glove Input

The RFID system used with the glove is realized usingan inductive RFID reader module manufactured by ID In-novations1, model number ID2. In addition to having acarrier frequency of 125 Khz, the module was also chosenbecause it has no internal or included antenna allowing acustom antenna to be designed for the glove. The mod-ule’s operating protocol is extremely simple as it supportsno user commands, and simply reports data when a tag isread. The data rate from the tag is found by dividing thecarrier frequency by the number of carrier periods to en-code 1 bit, which in this case is 64, giving an overall datarate of 1.953 kilobits per second. Tags send 64 bits of data,of which 40 bits are their unique tag ID. With a reader datarate of 1.953 kbps, the time necessary to send the 64 tagdata bits is 32.77ms. User latency for each tangible but-ton event will be this time plus the time necessary for theID2 module to process and transmit the data to a computerhost. It is a simple circuit that allows the ID2 module to beread over a USB connection to a computer, and includesa activity LED that flashes when a tag is read. The powerconsumption of the circuit is calculated to be 140mW, ofwhich 65mW is taken by the ID2 module itself.

In addition to the module, the other components are theconnection to the host computer and the antenna. The ID2reader is interfaced to the computer host using a FT232Rasynchronous to USB interface circuit made by FutureTechnology Devices Ltd. As the ID2 has no control in-terface, there is only a read data path from the ID2. TheFT232R can be easily substituted by a wireless devicesuch as a Bluetooth or Zigbee radio device.

The final component of the design is the antenna in-ductor. It should have an inductance of 1.08mH in orderto form a resonant circuit at 125 Khz. The ID2 moduleprovides an internal 1500pF capacitor to form the reso-nant circuit with the coil, although the implementer canadd external capacitance to allow other coil designs to beused. The 1.08mH coil used with the glove consists of275 turns of #33 enameled magnet wire scatter wound byhand on the index finger tip of the glove, which forms anideal orientation for the generated magnetic field lines tobe used in a tangible button application. Although easy tomake, the exact number of turns needed for such coils aredifficult to compute, and usually are made by winding un-til the desired inductance as measured using an inductancemeter. An Agilent Technologies inductance meter modelnumber U1731A was used to measure the inductance ofthe coil. The choice of #33 gauge wire is not critical, anda physically smaller coil can be made by using finer gaugewire. The complete RFID module and finger mounted an-tenna are shown in Figure 2(b).

5.3 Computer Vision

There are two major computer vision approaches for ob-ject recognition; appearance based and feature based. Ourtangible buttons did not have any distinctive features ortextures so we employed an appearance based approach inorder to detect the retro reflective markers on the tangiblebuttons (shown in Figure 2(a)).

Our appearance based approach utilizes edges ex-tracted from the images obtained from IR cameras. Edge

1ID Innovations. 21 Sedges Grove, Canning Vale, W.A., 6155 Australia

based recognition was previously used for detecting mark-ers in ARTag [7]. Performance of this edge based methodwas better in recognition accuracy than threshold basedmethods, especially when illumination conditions change.The edge based approach was also stable and jitter free,which are important for overall system performance andusability. In SAR environments, illumination changeseven more dynamically than in marker based AR. Withthis consideration, we used edge based method instead ofthreshold based ones. We used Canny edge detection al-gorithm [6], which is widely used for its accuracy and per-formance.

After edges were extracted, contour information wasobtained by using the method suggested by Suzuki andAbe [19] from the binary edge images in order to differ-entiate tangible buttons from objects of other shapes. Tan-gible buttons were assumed to be brighter than the back-ground and in square shape. Contours and holes could bedetermined by the gradient difference between the insideand the outside of the closed curve. The shape of the con-tour was approximated by Douglas Peucker algorithm [6]for eliminating noise and jitters. Finally, by traversingcontour vertices, convex, square shape buttons were iden-tified.

6 RFID Finger-tip Read Resolution Performance

The goal of this evaluation is to firstly understand if falseactivations occur when using our RFID activated tangi-ble buttons and secondly to quantify what error rate to ex-pected during prototyping. For example, when waving theglove near buttons without touching them do button pressevents occur? and how close is the glove when events oc-cur? This will allow us to better understand their operationand provide a comparison to traditional push button func-tionality.

We conducted two performance tests to determine theread resolution of the finger-tip mounted antenna duringuse. The purpose of the first test is to measure the distancefrom the centroid of the finger mounted antenna to the cen-ter of the RFID tags (shown in Figure 3(f)) that is requiredto registered an event. The purpose of the second test is tochallenge the results of the first test and demonstrate thatclosely located RFID buttons can be recognized uniquely.Additionally, the test demonstrates that RFID buttons aresuitable for supporting interactive user interface function-ality.

6.1 Read Distance Experiment

To measure the activation distance, we prepared a radialmeasurement apparatus with a series of concentric ringsaround an RFID tag (shown in Figure 3(a)) to allow thedistance to be recorded upon event activation. Each ringhad a measurement unit assigned to it, ranging from 9mmto 40mm from the centre of the RFID tag. A natural fingerorientation, of 45 degrees from vertical, was used through-out the measurement process.

6.1.1 Procedure

To capture the activation distance from different directionswe repeated the following procedure from three approachdirections, along the X-axis (left-right motion when facingthe RFID tag), the Y-axis (vertical motion when facing theRFID tag) and the Z-axis (forward-backward motion whenfacing the RFID tag). With the reading software operating,the user’s finger started on the outside measurement ring.It was then slowly moved towards the center of the RFIDtag until an event from the system was registered. Theresting position of the finger was recorded. This processwas repeated ten times for all three approach directions.


33

(a) (b)

(c) (d)

(e) (f)

(g)

Figure 3: (a) Pose and measurement markers used. (b)Testing pose of the X-axis moving the finger towards theRFID tag. (c) Testing pose from the Y-axis. (d) Testingpose from the Z-axis. (e) 3x3 grid of RFID tags used forproximity testing. (f) Demonstrate measurement locationsfrom center of fingertip and center of antenna to the RFIDtag. (g) Summary of statistical results showing event reg-istration distance across three axis.

Figure 3(a) shows the finger angle pose and measuring ap-paratus used for the evaluation.

6.1.2 Summary

For each approaching angle we describe the distance be-tween the tag with two values, the first is the measure-ment between the center of the antenna to the center of theRFID tag. The second distance describes the gap betweenthe closest edge of the RFID tag and the users finger (bothshown in Figure 3(f)).

The first approach angle measured the activation dis-tance when the finger was moved along the X-axis (asshown in Figure 3(b)). The mean distance recorded be-tween the center of the antenna and the RFID tag was

12.2mm (or touching the side of the RFID tag).The second approach direction measured the activation

distance of the Y-axis (as shown in Figure 3(c)). The meandistance recorded between the center of the antenna andthe RFID tag was 21.5mm (with a distance of 8.5mm be-tween the edge of the finger and the edge of the tag).

The third approach angle measured the activation dis-tance of the Z-axis (as shown in Figure 3(d)). The meandistance recorded between the center of the antenna andthe RFID tag was 14.8mm (with a distance of 1.8mm be-tween the edge of the finger and the edge of the tag). Fig-ure 3(g) provides a statistical summary of the activationdistances for all three axis using a box and whisker plot.

6.2 Grid Array Experiment

To challenge the results of our initial test we performed asecond experiment that uses a grid of closely located tags.For this test we placed nine tags in a 3x3 grid with eachtag touching its neighbour (shown in Figure 3(e)). Thegoal of the closely located tags was it increase the chancesof incorrect readings to indicate how user interfaces withtightly packed buttons would perform using our system.

6.2.1 Procedure

The finger mounted antenna was worn and the user repeat-edly pressed the center button. The software was config-ured to display the unique ID of a tag when it was touched.When the event was registered we compared the displayedID with the expected ID. This process of touching the mid-dle RFID tag was repeated 20 times. Three conditionswere recorded, correct read, incorrect read and no readoccurred.

6.2.2 Summary

A summary of the results can be seen in Table 1. Ourresults showed that 16 of 20 (or 80%) of button presseswere recorded correctly, 2 of 20 (or 10%) were incorrectlyidentified and for 2 of 20(or 10%) there was no event reg-istered.

6.2.3 Results

The goal of these tests is to validate that RFID tags area useful tool for capturing tangible button presses with-out the need for electronics that required wired switchesand micro-controllers like traditional prototypes. Specifi-cally the aim was to demonstrate the button presses canbe emulated using RFID tags and that groups of RFIDtags can be placed relatively close to each other and beused to successfully capture events. These results of thefirst test show that the use of RFID tags is suitable foridentifying closely located tangible button with our glovemounted reader. This is supported by the results of thesecond test that challenged the scenario of closely locatedRFID tags during operation. Our result of 80% successfulbutton clicks is not as reliable as a traditional button, how-ever we consider this acceptable for an early prototype thathas a flexible form with interactive function.

We also observed that with our current configurationRFID event registration can occur before the user phys-ically touches the tangible buttons.Although haptic sen-sation and button press event synchronization is desir-able, the pre-touch button press event allows for the RFIDtags to be embedded inside tangible buttons without pre-venting button press event registration. This suits there-configurable nature of our purpose, to allow dynamic,quick re-configuration for exploring different user inter-face layouts. In addition, some tuning can be performed


34

Table 1: Read results of the group of nine RFID tags

Correctly Identified Incorrectly Identified No Read Recorded16 2 2

80% 10% 10%

to increase or decrease the read resolution of the glove an-tenna on the glove, allowing, if necessary, a higher readresolution to be setup.

7 Expert Review

To validate our new dynamic SAR design tool, we under-took a qualitative expert review of the design process withprofessional designers. We wished to understand the im-pact of our new SAR design tool on the design process.The expert review evaluation methodology allowed us tobetter understand the overall effectiveness of the designtool in context with a real design task. This section de-scribes the experimental design, and then outlines the de-sign scenario presented to the professional designers. Theresults of the expert review are also discussed.

7.1 Experimental Design

We approached the evaluation of the process using a quali-tative expert review. Selection was done by picking partic-ipants who have extensive design training and experience.We grouped the participants into pairs to stimulate opendiscussion of the design process with our new SAR de-sign tool. The participants were divided into two teamsof two senior designers that have worked in both indus-try and academia for over 30 years, one team of industrialdesigners and second team of two architects.

7.2 Scenario

To put our design process in context for our participants,we selected a scenario that is familiar but also a new de-sign problem. We selected the design of a simple calcu-lator to evaluate our design methodology. For our sce-nario, a basic calculator consists of the sixteen buttons:ten single digit numeric keys, the five basic operators anda clear function. Traditionally, calculators have a squarelayout (akin to the numeric keypad on a computer key-board) however, to challenge the designers we provided anumber of shapes that made it difficult to use a traditionallayout. We provided two scenarios the first was the de-sign of a bone shaped (letter H shape) calculator (as shownin Figure 4). The second scenario was a long skinny barshape. Both these scenarios were designed to stop the de-signers falling back on the standard square configuration.

7.3 Protocol

The protocol for this experiment is as follows:

1. We received the participants in pairs (a team) in aseparate location to that of the experiment. This al-lowed us to concentrate on the experimental proce-dure without being distracted by the aparatus. Thenature of the review was discussed and permissionwas gained for audio and photographic recording.

2. The aim of the project was explained making surethat the explanation did not introduce bias into theparticipants minds. Participants discussed their ex-perience in the domain of the review.

3. Instructions were given in how to perform the experi-ment such as ensuring they speak aloud as they workthrough the scenario.

Table 2: Participant Question

Q1 Is this a useful tool to you for design?Q2 Would the tool or the process interfere with your current design process?Q3 What aspects of this tool are useful or not useful in the design process?Q4 Does this allow you to do something you could not do before? If so

what does it allow you to do?Q5 Where in the design process would you use this technique?Q6 Which features are useful in the system?Q7 What needs to be improved in the system?Q8 If we addressed your concerns. Could you see this being used in indus-

trial design?

4. The survey questions were read out and discussed be-fore the actual design process so that the participantsknew what would be expected of them at the end.

5. The participants were led through to the experimentlocation and an introduction was given on how to usethe system.

6. The design scenario was then described.7. They progressed through the two scenarios to create

layouts for the calculator.8. As they proceeded through the design the observer

was instructed to ask questions to stimulate the par-ticipants’ conversation.

9. Once completed they were given a verbal survey.

The scenario was supported by the SAR design toolby providing the participants with sixteen tangible but-tons that can be re-configured into different arrangements.Each button retained its functionality during this process.The sixteen tangible buttons could be moved around a22cm X 34cm surface to create a variety of different ar-rangements. The loading texture palette was configuredto display the available appearances on the left hand sideof the work area allowing the designers to quickly changethe appearance by placing the tangible button in the pre-defined area. The basic calculator functionality was pro-vided by an application with a display texture used topresent a simulated output as a virtual text box to providethe output for the calculator’s display.

7.4 Results

Overall the feedback from the expert review was very pos-itive. The survey questions are listed in Table 2. A sum-mary of the responses for each question is discussed.

Question one (useful for design?) was designed to val-idate if the design tool was worth using. Industrial de-signers pointed out that the 3D design will improve thethoroughness of the product design. That is by having areal physical artifact they can get a more complete repre-sentation of the final product. The architects were morecircumspect but felt that the physicality of the tool meant

Figure 4: Calculator example using RFID enabled tangi-ble buttons.


35

that users could gain a better feel for a design early in theprocess.

Question two (interfere with current process?) wasconstructed to get the designers to think about whetherthis would fit into their natural design process or wouldit slow down their creative process. A common feedbackacross both groups on this question was that the physi-cal structures may interfere with the design process if thebasic shapes had not been established. The groups bothagreed that if the shapes have been decided (or are fixed)then the tool would aid in the process.

Question three (aspects useful for design process?)was designed to get the participants to focus on the bestfeatures from the tool. The participants pointed out thatthis would be useful in conjunction with the limitationspointed out previously. One participant expressed this bynoting that it “allowed the users to put some air in the bal-loon”. In other words allowed a quick first cut design totry out some ideas which is the exact goal of our overallsystem.

Question four (could it do something not possible be-fore?) had some clear consistencies between the partic-ipants, in particular it allowed early design iterations tobe quickly examined. One group pointed out that the de-sign tool could be used to overlay various layers of thedesign (imagine the wiring routes in a car) so that a de-signer could get a good idea of the design in their headquickly.

Question five (where in design process?) was designedto further encourage the participants to think about howthis would fit into their natural design process. Feed-back for this was very consistent between not only sep-arate groups but also the individuals in each group. It wasagreed that this tool would fit best into the early designprocess. The industrial designers also pointed out that thedesign tool would reveal ergonomic and drafting errorsearly in the design.

Question six (what features are useful?) got the par-ticipants to pick features they thought were useful. Par-ticipants said that working on a physical prototype givesa more natural feeling to that of being on a screen with awire frame and tools. One group of the industrial design-ers identifed some of the opportunities that the combina-tion of plain paper and SAR together offered and said thatsimplicity felt good. This was not intended to be part ofthe tool but it is easy to see the freedom that adding thesketch feature adds to the design tool.

Question seven (what needs to be improved?) got themto isolate features they thought were less useful. The par-ticipants were concerned about having fixed sized objectsenforced on them. Both groups reiterated that the designtool would be useful if the shapes were fixed or decidedand the design was more about the surface design of theproduct. For example toasters are generally of very similarshape but the controls, style and aesthetics are the variantpart.

Question eight (could this be used in industrial de-sign?) showed that the users were generally very positiveabout the possibilities of this technology. They could seea place for this in their design processes. The industrialdesigners believed that this process would be good in cus-tomization part of the process where the basic shape anduse of the product has been designed but final stylistic andoperational features were needing to be decided.

7.5 Discussion

Overall while not suitable for the entire design process thedesigners saw significant advantages in using the tool forearly design stages. For instance the design of a number ofsimilar products such as variations on radio fascias wouldbenefit from this approach. One designer stated, “Having

a tool box of controls would allow you to rapidly layouta radio, save the design, then have a number of workingvariations quite quickly”. Another designer commented“Anything which provides a more true to life three dimen-sional simulation of the design intent will allow more thor-ough investigation of alternatives and better verified out-comes.” This indicated to us that the designer liked thephysicality of a SAR tool that implemented real physicalcontrols that could be held, arranged and explored.

An unexpected use for the tool that was pointed outwas the evaluation of ergonomics. An industrial designerstated “It could also provide a quick and effective rig forevaluation of various ergonomic configurations.” The in-dustrial designers were also excited about the opportunityto realistically represent reach, texture and scale whichthey normally do not see until the final prototype. Thearchitects indicated that the use of the tool for generatingbriefs (requirements and constraints) by modeling scenar-ios in the tool would be very useful in their field. Overall, aconsistent theme was that the tool was useful in the earlyphases of a design; in particular after the initial conceptwas done but before the finalized design was set. Bothteams agreed that this tool would be useful in a profes-sional environment.

7.6 Improvements

There were three notable improvements suggested by thereviewers; firstly, grouping of components that would al-low moving multiple tangible buttons concurrently thatwould aid in quick modifications of the design. This isthe equivalent of a group select and drag in more tradi-tional user interfaces. Secondly, the ability to draw on theprojection and have the shape transferred back to the de-sign software. This is desirable as it brings the design toolback into the designers process. Thirdly, allow designersto place annotations on the design so that design decisionscan be recorded. The last two improvements can be seento smooth the transition between the designers process andthe design tool.

8 Conclusion

This paper has presented a novel user interface methodol-ogy to be used for product design in a SAR environment.Tangible buttons are leveraged to provide a physical inter-face that allows the designer and end user to re-configurethe layout of the user interface during development. Toimplement the system, we employ spatial augmented re-ality for appearance details, vision tracking to capture thephysical movement of tangible buttons and a custom fin-gertip resolution RFID reader to capture tangible buttonpresses. The performance of the read resolution was eval-uated and the results validated the use of RFID tags forinteractive product designs.

The system was evaluated by professional designersvia an expert review. Our initial qualitative evaluation hasshown that the concept of incorporating tangible buttonsto overcome the simplified haptic feedback of SAR visu-alizations has been improved. The expert review indicatedthe tool is useful for industrial designers and architects.They stated that the haptics of being able to move physicalobjects around gave a solid connection to the final productand provided a number of future directions.

In the future we will explore the localized read area ofthe fingertip mounted antenna as it would be possible toplace antenna coils on all of the fingers of the glove allow-ing a multi-touch model to be used. This would be rela-tively easy to add to the current design by placing identicalcoils on each finger and suitable electronics. We wouldalso like to extend the electronics to support a push and


36

hold button model. We would also like to develop addi-tional UI controls, such as sliders and dials. Finally, inthe future we could also incorporate a mechanical click-ing mechanism which will further increase the haptic sen-sation complexity.

References

[1] E. Akaoka and R. Vertegaal. DisplayObjects: pro-totyping functional physical interfaces on 3D styro-foam, paper or cardboard models. In ACM Con-ference on Human Factors in Computing Systems,2009.

[2] D. Avrahami and S. E. Hudson. Forming interactiv-ity: a tool for rapid prototyping of physical interac-tive products. In Proceedings of the 4th conferenceon Designing interactive systems: processes, prac-tices, methods, and techniques, DIS ’02, pages 141–146, New York, NY, USA, 2002. ACM.

[3] D. Bandyopadhyay, R. Raskar, and H. Fuchs. Dy-namic shader lamps: Painting on movable objects. InIEEE and ACM International Symposium on Mixedand Augmented Reality, pages 207–216, 2001.

[4] O. Bimber and R. Raskar. Spatial Augmented Real-ity: Merging Real and Virtual Worlds. A K Peters,Wellesley, 2005.

[5] M. Broxton, J. Lifton, and J. A. Paradiso. Local-ization on the pushpin computing sensor networkusing spectral graph drawing and mesh relaxation.SIGMOBILE Mob. Comput. Commun. Rev., 10:1–12, January 2006.

[6] D. Douglas and T. Peucker. Algorithms for the re-duction of the number of points required to representa digitized line or its caricature. The Canadian Car-tographer , 10(2):112–122, 1973.

[7] M. Fiala. ARTag Revision 1, A Fiducial Marker Sys-tem Using Digital Techniques NRC Technical Report(NRC 47419). National Research Council of Canada,2004.

[8] K. Finkenzeller. RFID Handbook: Radio-FrequencyIdentification Fundamentals and Applications. JohnWiley and Sons Ltd, 1999.

[9] K. Fishkin, M. Philipose, and A. Rea. Hands-onRFID: wireless wearables for detecting use of ob-jects. In Proceedings of the Ninth IEEE InternationalSymposium on Wearable Computers, pages 38 – 41,2005.

[10] S. Greenberg and C. Fitchett. Phidgets: easy devel-opment of physical interfaces through physical wid-gets. In Proceedings of the 14th annual ACM sym-posium on User interface software and technology,pages 209–218, Orlando, Florida, 2001. ACM.

[11] M. Hisada, K. Takase, K. Yamamoto, I. Kanaya, andK. Sato. The hyperreal design system. In IEEE Vir-tual Reality Conference, 2006.

[12] J. C. Lee, D. Avrahami, S. E. Hudson, J. Forlizzi,P. H. Dietz, and D. Leigh. The calder toolkit: wiredand wireless components for rapidly prototyping in-teractive devices. In Proceedings of the 5th confer-ence on Designing interactive systems: processes,practices, methods, and techniques, pages 167–175,Cambridge, MA, USA, 2004. ACM.

[13] M. R. Marner and B. H. Thomas. Augmentedfoam sculpting for capturing 3D models. In IEEESymposium on 3D User Interfaces, Waltham Mas-sachusetts, USA, 2010.

[14] L. Muguira, J. Vazquez, A. Arruti, J. de Garibay,I. Mendia, and S. Renteria. RFIDGlove: A wear-able RFID reader. In IEEE International Conferenceone-Business Engineering, pages 475 –480, 2009.

[15] S. R. Porter, M. R. Marner, R. T. Smith, J. E. Zucco,and B. H. Thomas. Validating the use of spatial aug-mented reality for interactive rapid prototyping. InIEEE International Symposium on Mixed and Aug-mented Reality, 2010.

[16] S. Pugh. Total Design: integrated methods for suc-cessful product engineering. Addison-Wesley, 1991.

[17] R. Raskar, G. Welch, K. Low, and D. Bandyopad-hyay. Shader lamps: Animating real objects withImage-Based illumination. In Rendering Techniques2001: Proceedings of the Eurographics, pages 89–102, 2001.

[18] A. Schmidt, H.-W. Gellersen, and C. Merz. En-abling implicit human computer interaction. A wear-able RFID-tag reader. In Proceedings of the 4thInternational Symposium on Wearable Computers,pages 193–194, 2000.

[19] S. Suzuki and K. Abe. Topological structural anal-ysis of digital binary image by border following.CVGIP, 30(1):32–46, 1985.

[20] J. Tanenbaum, K. Tanenbaum, and A. Antle. Thereading glove: Designing interactions for object-based tangible storytelling. In Proceedings of thefirst Augmented Human International Conference,pages 132–140, 2010.

[21] B. H. Thomas, M. Smith, T. Simon, J. Park, J. Park,G. S. V. Itzstein, and R. T. Smith. Glove-based sen-sor support for dynamic tangible buttons in spatialaugmented reality design environments. 2011.

[22] J. Verlinden, A. de Smit, A. Peeters, and M. vanGelderen. Development of a flexible augmented pro-totyping system. Journal of WSCG, 2003.


37


38

A Virtual Touchscreen with Depth Recognition

Gabriel Hartmann Burkhard C. Wunsche

Graphics Group, Department of Computer ScienceUniversity of Auckland, New Zealand,

Email: [email protected], [email protected]

Abstract

While touch interfaces have become more popular,they are still mostly confined to mobile platforms suchas smart phones and notebooks. Mouse interfaces stilldominate desktop platforms due to their portability,ergonomic design and large number of possible inter-actions. In this paper we present a prototype for anew interface based on cheap consumer-level hard-ware, which combines advantages of the mouse andtouch interface, but additionally allows the detectionof 3D depth values. This is achieved by using a webcam and point light source and detecting hand andshadow gestures in order to compute 3D finger tippositions. Our evaluation shows that the concept isfeasible and more powerful interactions than with tra-ditional interfaces can be achieved. Limitations in-clude a reduced input precision, insufficient stabilityof the utilised computer vision algorithms, and therequirement of a stable lighting environment.

Keywords: touch interface, gesture recognition, stereovision, shadow detection

1 Introduction

The past decade has brought a remarkable shift inthe way humans interact with computers. For thefirst time the traditional keyboard and mouse inter-faces have a real alternative in touchscreen and mo-tion based controls. This shift has coincided with theexplosion in popularity and unprecedented affordabil-ity of mobile devices and MEMS accelerometers, gy-roscopes and magnetometers.

So far, touch and motion based interfaces havebeen almost universally confined to mobile devices,public displays, entertainment systems, and commer-cial devices. Touchscreens have been present on com-mercial devices in the form of point of sale devices,automatic teller machines, and similar equipment forsome time. In consumer entertainment systems, theNintendo Wii was the first device to embrace mo-tion based input and see wide spread adoption. Itwas quickly followed by competing products from Mi-crosoft with its Kinect, and Sony with its Move. Inmobile devices Apple’s iPhone was the first widelyadopted device which employed only touch and mo-tion based controls. The concepts have been quicklyadapted by a slew of competitors, and a similar de-velopment can be observed for the iPad, the first

Copyright c©2012, Australian Computer Society, Inc. This pa-per appeared at the 13th Australasian User Interface Confer-ence (AUIC 2012), Melbourne, Australia, January-February2012. Conferences in Research and Practice in InformationTechnology (CRPIT), Vol. 126, Haifeng Shen and Ross Smith,Ed. Reproduction for academic, not-for profit purposes per-mitted provided this text is included.

commercially successful notebook using these inter-face technologies.

Although the above devices are used widely,mostly by non-specialist users, the employed inter-faces have not found their way back to the personalcomputing desktop platform. One reason for this lackof feature migration has to do with the excellence ofthe current system. The mouse and keyboard com-bination is an extremely versatile system that is wellunderstood and has been used happily by millions ofpeople for decades. The mouse in particular holdssome serious advantages over the touchscreen inter-face on the desktop:

• The mouse cursor obscures less screen space thana finger or stylus.

• Resting the hand on a mouse is more comfortablefor complex interactions than lifting the handand touching a screen.

• Mouse motions can be scaled in order to coverlarge screen space with small physical move-ments.

• The screen is not soiled by dirt and oil from theuser’s fingers.

• A mouse is highly portable and can be used withdifferent computers.

• Traditional display devices with a mouse are usu-ally more affordable than touch screens.

The mouse, however, is not a perfect input device.One major shortcoming is its limitation to a singlepoint of interaction. The use of two mice has beenexplored in academia since more than a decade (Lat-ulipe et al. 2006, Gonzalez & Latulipe 2011). In con-trast to multi-touch input this functionality is notwidely supported and few real-world applications ex-ist. One of them is virtual surgery simulation, wherethe two mice control different instruments used by thesurgeon (SIMTICS Ltd. 2011).

Mice are also limited in their repertoire of actions.The single-click, double-click, and drag operationsrepresent the main functionalities of a mouse. Ad-ditional functionalities have been added in the formof multiple mouse buttons and scroll wheels, and bycombining mouse interactions with keyboard interac-tions.

Some operating systems and applications have be-gun to emulate touch and motion screen interactionsvia the mouse. For example Windows 7 will minimiseall other windows if another window is dragged backand forth repeatedly, emulating a shake. The Operaweb browser employs a large variety of optional mousegestures which can control the web browsing experi-ence. The attempts to incorporate these sorts of inter-action into the desktop experience highlight the fact,


39

that there are interactions that touch and motion in-terfaces provide that are desirable on the desktop, butcannot be accomplished with the mouse.

In this paper we explore whether it is possible toproduce an interface which combines some of the ad-vantages of the mouse and touch screen, is portable,uses cheap consumer-level hardware, and has addi-tional capabilities neither the mouse nor touch screenoffers.

Section 2 analyses the problem, proposes a solu-tion concept, and derives design requirements for it.Section 3 reviews the literature in this field and dis-cusses relevant technologies. The design of our so-lution and implementation details are explained insections 4 and 5. Section 6 evaluates our solution,compares it with our original goals, points out lim-itations, and presents an informal user study. Weconclude this paper in section 7 and section 8 gives ashort overview of future work.

2 Problem Analysis and Design Require-ments

2.1 Problem Analysis and Solution Proposal

Two major disadvantages of touchscreens are theircost and that interface and display device can not beseparated, i.e., reuse and portability is limited. Wetherefore propose to use as interaction space any flatsurface, e.g., the user’s desk. In order to mimimisehardware requirements the interaction device is theuser’s hand. Hand tracking is a well researched sub-ject, but is still technologically challenging, and thebest results are achieved by using stereo vision (twocameras) or markers. Since many computers alreadyhave a single inbuilt web cam, we will only require onesuch camera, which is oriented to view the interactionspace, e.g., the desk. In order to achieve more infor-mation about the users hand gesture and position weuse a point light source, which casts a shadow ontothe interaction surface.

We propose to track the hand and its shadow inorder to obtain 3D locations. This corresponds to themouse drag operation, but additionally adds a thirddimension given by the hand’s height over the interac-tion surface (e.g., desk). Different interaction modes,defined for the mouse by different mouse button andsimultaneously pressed keyboard keys, are achievedby performing a limited gesture recognition.

2.2 Design Requirements

We assume a low specification web-cam with a mini-mum resolution of 640x480 pixels. Calibration of boththe desktop surface and the camera is achieved usinga calibration grid, which can be printed out by anyprinter (although a laser printer is preferable). Oursolution should have a low computational complexity,such that it can be used even on entry-level desktopmachines without advanced CPU/GPU. Finally werequire a single point light source which illuminatesthe interaction surface. For illumination a lamp witha light bulb is sufficient, as long as it is positionedabove the users and preferably on the side. Exam-ples are most ceiling lights (depending on the deskposition) or a clip-on desk lamp.

In terms of functional requirements the aim of ourresearch is to provide device level support for touchand motion controls. The interpretation of deviceoutput is not a concern. For example, we are inter-ested in creating 3D coordinates reflecting the usersinteractions, but we leave the interpretation of thesecoordinates, e.g., as mouse click or mouse drag, up tothe application developer.

Hence we can summarise the goals of our researchas follows: We want to create a system using a webcam, point light source and flat illuminated surfacein order to determine 3D positions for hands and fin-gertips above and on the desktop surface. We wantto be able to detect touches on the surface, but wedo not try to emulate a multi-touch interface. Thegoal of the research is to investigate the feasibilityof the described configuration as an inexpensive 3Dinteraction interface for a desktop platform.

3 Literature Review

Hand based computer interaction is a popular re-search area due to its intuitiveness and promise toenable true 3D interactions.

In situations where the accuracy and speed ofhand tracking is important data gloves are consideredthe best choice. The devices usually employ electro-mechanical, electro-optical or magnetic sensing tech-nologies. They are application independent and singlepurpose, devoted entirely to hand tracking (Erol et al.2007). Data gloves produce real-time results and areable to capture all the degrees of freedom a humanhand has to offer. However, they are expensive, re-quire precise calibration, and might hinder naturalhand motion to some degree.

Computer vision based hand tracking approachescan overcome the expense, inconvenience, and un-naturalness of data gloves. Cameras for computersare now widely available and inexpensive, and thecontact-free image-based sensing does not interferewith hand motions. However the difficulty of 3D handpose estimation has so far resisted attempts to pro-duce results which are comparable to those producedby data gloves.

The main difficulties of computer vision basedhand pose estimation are high-dimensionality, self-occlusion, high computational requirements, uncon-trolled environments, and rapid hand motion (Erolet al. 2007). High-dimensionality refers to the factthat the human hand has over 20 degrees of freedomwithout taking into account the position and orienta-tion of the hand as a whole (Erol et al. 2005). Self-occlusion occurs when fingers and/or palm overlap onthe two-dimensional image of a camera. Uncontrolledenvironments can result in arbitrarily complex back-grounds and unpredictable lighting scenarios. Rapidhand motion refers to the hardware and software lim-itations that constrain the ability of applications totrack rapidly moving hands. All the above pointsmake complex algorithms necessary to detect, trackand disambiguate hand configurations, which leads toa high computational complexity.

Each of the difficulties described is non-trivial tosolve individually. Taken together it is unsurprisingthat no general solution for computer vision based3D hand pose estimation and tracking exists. All at-tempts so far make assumptions which eliminate ormitigate some of the difficulties.

3.1 Two Dimensional Fingertip Tracking

Hand tracking can be simplified by only determiningcertain key features such as the finger tips. Harden-berg & Berard (2001) use the position of fingertipsand the number of detected fingers, in order to de-fine pointer positions and different commands akinto mouse clicks. Fingertips are detected by employ-ing an image segmentation algorithm and fitting cir-cles to foreground image sections. The problems ofself-occlusion are ignored with the tacit assumptionthat as long as fingertips are sufficiently visible to beidentifiable, operations can continue. For fast hand


40

motions finger tip positions are not individually iden-tified but estimated from previous frames.

The authors present four proof-of-concept applica-tions. Three of the applications replicate mouse func-tionalities, with the fourth allowing for multiple inputpositions. The applications highlight some difficultiesof the system. As it is entirely a 2D system no touch-ing of the interacting surfaces is detected. Instead,the authors’ web browsing and painting applicationrepresents mouse clicks by maintaining a hand posi-tion for one second. The system also supports simplefinger counting gestures for a slide presentation appli-cation. These gestures have no movement componentor intuitive relation to their function. For example,displaying two fingers indicates a need to move tothe next slide, and three fingers indicate a move tothe previous slide. This requires memorization of theslide application commands by the user. No attemptis made to actively deal with the problem of misclas-sifying shadows of fingertips as actual fingertips. Theauthors’ experimental results showed that this errorwas not uncommon.

Song et al. (2007) improve Hardenberg andBerard’s algorithm by additionally taking the fingershape into consideration, which is done by segmentingand classifying the pixel region connecting the fingertip to the palm. Laptev & Lindeberg (2001) use par-ticle filtering and multi-scale image features for fingertip detection and tracking. Terajima et al. (2009) usea template approach, which also gives some 3D infor-mation. Hsieh et al. (2008) use finger tip detection forhandwriting recognition. The finger tip motion is ob-tained from frame differences. The search for a newfinger position is sped up by predicting its positionusing a Kalman filter.

3.2 Surface Touch Detection from CameraImages

Malik & Laszlo (2004) use stereo cameras to detecttouch interactions with the underlying surface andthis way construct a virtual touchpad. An importantconstraint is, that the background must be black andrectangular. This improves calibration and avoidsproblems with shadows.

Detecting 3D motion with a single camera is moredifficult. Wang et al. (2007) capture hand motionwith a single camera in real-time by combining amodel-based and appearance-based method. Inter-ference among fingers is resolved using k-means clus-tering and particle filters.

Segen & Kumar (1999) simulate touch interac-tions by using additionally a single point light source,whose position is determined in a calibration step.The surface upon which the light source casts shad-ows must also be defined, although this step is notexplicitly described in the paper. Instead the cam-era’s position is defined in terms of the surface planewhich is defined as the Z = 0 plane. Segen and Kumardo not attempt to detect surface touches and indicateuse of a uniform background. They find fingertips andfinger orientation with exactly the same method em-ployed by Malik & Laszlo (2004). The orientation offingers is defined by the line which extends throughthe fingertip and the midpoint of the end points ofthe two vectors which designated it as such.

Much of this 3D information is not employed intheir proof-of-concept applications. Actions are al-most entirely defined by static gestures, with the ex-ception of a “clicking” gesture which employs a bentfinger motion. This gesture requires a stationaryhand at the time of its execution, which is perhapsproblematic in the suspended hand positions indi-cated by the authors. Once a gesture is detected,the pose of the hand in space is used to modify the

application of the gesture’s command. For example, atwo finger gripping gesture can be used to manipulatea virtual robot arm. The arm is oriented according tothe lines defining the fingers. The distance betweenthe gripping portions of the robot hand depends onthe distance between the two fingers in the gesture.Using three fingers to grab a virtual object is howeverimpossible as no three finger gesture is defined. Like-wise, if two fingers are used, but they are not straight,the gesture will be unrecognised and no input will oc-cur.

3.3 3D Hand Pose Estimation

A large variety of hand tracking algorithms has beenproposed. A good survey is given in (Mahmoudi &Parviz 2006). Two important categories are marker-based and marker-less methods.

Marker-based hand tracking algorithms requirethe user to wear point or area markers (Park & Yoon2006, Lee & Woo 2003, Wang & Popovic 2009). Ro-bustness and interactive speed are achieved by em-ploying simplified hand models in order to resolveambiguities in the tracked marker positions. How-ever, the need for auxiliary devices (markers, gloves)can be inconvenient for the user and often requiressome type of calibration, e.g., to align marker posi-tions with positions on the underlying hand model.

Without markers the easiest way to identify (po-tential) hand shapes is by using a skin colour classi-fier (Kakumanu et al. 2007, Vassili et al. 2003). Sensi-tivity to changes in the illumination can be reduced byusing a perception-based colour space (Chong et al.2008). 3D position and orientation of the hand shapecan be obtained by using a 3D hand model and search-ing for a mapping between it and the perceived handshape subject to the model’s inherent constraints(e.g., joint constraints and rigidity of bones) (Stengeret al. 2001, 2006). A different approach is to performthe matching between hand features rather than theentire shape (Chen et al. 2007).

Single camera systems suffer from the high compu-tational complexity and a limited robustness. Theywork best if the range of possible hand motions is con-strained (Liu 2010, Liu et al. 2011). Tracking can besimplified by using depth information obtained usingmulti-camera vision or stereographic systems (Argy-ros & Lourakis 2006).

4 Design

In order to simplify the design we assume that thebackground is stable (e.g., a desk surface rather thanan office with people moving around) and that onlyone hand and its shadow must be tracked. The tech-niques presented in this section will also work for twohands, as long as the hand images and shadows don’toverlap. However, this extension was deemed un-necessary for our proof-of-concept application. Thetracking of the hand shape and its shadow requiresthe following three tasks to be performed for eachframe: segmentation, feature detection, and featureposition estimation.

4.1 Segmentation

Segmentation determines the objects of interest andseparates them from the background. We assume astatic background, such as a desk surface, and a staticlighting environment using a single point light source,e.g., a desk or ceiling lamp.


41

4.1.1 Background Classification

Segmentation is achieved by first computing a statis-tical model of the background. Since the user hascontrol over the environment we can assume that noforeground objects, such as skin coloured objects andshadows, are in the field of view. We compute for eachpixel an average value and average difference, whichapproximates the standard deviation but is faster tocompute (Bradski & Kaehler 2008). For the compu-tation of both statistics the three RGB channels re-main separated. The statistics are used to define foreach pixel a range of colours considered to representthe background. The threshold values representingvalid background pixels were determined experimen-tally in order to optimise correct classifications (seesection 5). All pixels outside this range are consid-ered foreground. The statistical model is necessarysince pixel colours change between frames even forstatic environments. Causes are pulsed light sourcessuch as energy saving (fluorescent) lights, slight vi-brations, and camera specific issues such as noise.

4.1.2 Hand and Shadow Identification

The second step of the segmentation process dividesthe foreground pixel colour histogram into regionsrepresenting the hand and its shadow (see figure 1).This is achieved by observing that shadows changepredominantly the “Value” of the HSV colour of theobject onto which they are cast, whereas the hue andsaturation show little variation. Since we restrictourselves to one-handed interactions, the shadow isalways cast onto the interaction plane. We hencecompare for each pixel of the foreground region itscolour with the pixel’s learned average backgroundcolour. This comparison is performed in the HSVcolour space. If the difference occurs largely for thevalue (V) component of the pixel, and is accompaniedby a slight decrease in the saturation (S) channel, thenthe pixel is considered to be part of the hand shadow.All other foreground pixels are considered to be partof the hand region. All thresholds for these test weredetermined experimentally.

Figure 1: The segmentation process separates fore-ground pixels into hand regions and shadow regions.The tips of each region are determined by computingan oriented bounding box for the largest connectedcomponent of each region, and selecting the pointclosest to its top-most edge.

4.2 Finger Tip Detection

In order to define meaningful interactions finger tipsmust be detected. The finger tip location can be usedto select points, draw shapes, or define different in-teraction modes according to the relative position ofmultiple finger tips.

4.2.1 Single Feature Point Detection

A single feature point for pointing and selection op-erations is defined by computing the minimum ori-ented bounding box of the contour of the hand re-gion. We then determine the edge nearest to the tipof the hand. Since the user usually sits opposite tothe web cam, and since the hand and wrist usuallyhave a smaller width than length, we use the shortedge of the bounding box furthest from the bottomof the camera image. The feature point for selectionoperations is then given by the point on the contourclosest to this edge. This algorithm gives a meaning-ful result for whole hand gestures (in which case thetip of the middle finger would be the feature point),as well as for single finger gestures, such as using theindex finger for selection, or when holding a pointingdevice such as a pen. Figure 1 and 2 illustrate thesethree cases both for the hand and the shadow region.

Figure 2: A selection/drawing operation using a sin-gle feature point defined as tip of the index finger(left) or tip of an external pointing device such as apen (right).

4.2.2 Multiple Finger Tip Detection

Detecting multiple fingertips and their shadows is athree-step process. The first two steps of the processcorrespond to the popular convex hull and convex de-fect detection approach (Homma & Takenaka 1985,Liu et al. 2011). The final step is the removal of de-fects irrelevant to fingertip detection.

In the first step a convex hull is placed aroundthe hand segment. Defects are defined in relation tothis hull. Any region within the convex hull which isnot part of the hand segment and which is adjacentto an edge of the convex hull is considered a defect.If this algorithm is applied to a hand with spreadfingers, six defects are commonly detected. Four ofthese defects lie between the fingers and the othertwo lie between the smallest finger and the wrist andbetween the thumb and the wrist. All but two of theendpoints of the defect regions coincide with fingertippositions.

In order to remove endpoints which do not corre-spond to fingertips, we use the observation that forconvex defects between fingers their depth is largerthan their width (Liu et al. 2011). Figure 3 gives anexample. Note that this algorithm only works as longas the fingers remain approximately straight. Fingerswhich are bent excessively can cause their definingdefect to be rejected and their fingertip position lostas a result.


42

Figure 3: Finger tip detection by computing convexdefects of the convex hulls of the hand (left) and ofthe shadow region (right).

4.3 Calibration for Depth Calculations

Accurately determining fingertip positions in three di-mensions requires a one-off calibration step definingthe relative positions and orientations of the camera,desktop surface, and light source.

4.3.1 Camera Calibration

In order to simplify the subsequent computation of 3Dpositions we would like to have an idealised camera.i.e., a “perfect pinhole camera”. A pinhole cameradoes not have a lens and hence does not introduceradial distortion. The image plane of a pinhole cam-era is exactly perpendicular to the optical axis and sotangential distortion which can be introduced by animprecise alignment of optical axis and image planeis absent. A pinhole camera produces images whichare a perfect projection of the objects in the worldonto the image plane. Its well defined geometry aidsin the determination of the position of objects in theworld and is therefore desirable in computer visionapplications.

In our application we assume that users utilisecheap consumer-level web cams. We hence have tocorrect inherent distortions, which is achieved byimaging a known object and comparing properties de-rived from the image (e.g., distances and angles be-tween lines) with the known ones. We use the popularchessboard calibration grid shown in figure 4, sinceopen source software for the calibration is available,and since it allows us to define the interaction planeas a by-product of the camera calibration.

Figure 4: An image of a chessboard pattern of knowndimensions is sufficient to characterise the distortioncharacteristics of the camera, and the camera’s loca-tion in reference to the surface upon which it is placed.Furthermore corner points of the chessboard patterncan be used to define the surface plane equation.

4.3.2 Desktop Plane Definition

The definition of the interaction (desktop) plane isachieved using the calibration grid in figure 4. Thethickness of the paper is considered negligible and aplanar homography between the points on the calibra-tion grid and the imaging plane is defined as a seriesof translations and rotations. The relative positionsof the focal point of the camera, and three points onthe calibration grid are sufficient to obtain a uniquesolution. However, in order to deal with inaccuraciesdue to noise we use all points of the calibration grid.An optimal solution for the plane equation is foundby using a least square method, which minimises thedistances of the points to the proposed plane.

4.3.3 Light Source Estimation

Given the intrinsic and extrinsic parameters of thecamera and the equation of the desktop plane in re-lation to the camera coordinate system it is possibleto unambiguously determine the positions of shadowpixels in the interaction plane. If we know two 3Dpoints and the locations of their shadows on the in-teraction plane, we can compute the light source lo-cation as the intersection of the lines formed by eachpoint and its shadow. If the lines are parallel then thelight source is at infinity (e.g., when using sun lightin an outdoor application).

In order to avoid the need for special 3D objectsfor calibration, we use a series of predefined hand po-sitions of the user. The first position is that of thehand with fingers close together, placed flat on thedesk (see figure 5). The second posture is a handmade into a fist and with only the index finger mak-ing a pointing gesture. The fist should be placed flatagainst the desk (see figure 6). The height h of thefinger tip of the index finger relative to the interac-tion plane is assumed to be 80% of the width of thebounding box of the hand position in figure 5. Thisapproximation was motivated by the fact that the in-dex finger is the fourth finger of the hand, and it gavegood results in our experiments.

By sliding the hand around in the second config-uration, as illustrated in figure 6, we obtain a se-quence of 3D points and corresponding shadow po-sitions, which we can then use to estimate the lightsource position by using the closest intersection pointof all rays or using a light source at infinity if the raysare approximately parallel.

Figure 5: A bounding box is fitted to the segmentedhand in order to determine the width of the hand, andthe height h of the index finger above the interactionplane for the light calibration step.

4.4 Computation of 3D Finger Tip Positions

3D positions of objects can be computed from their2D position on the image plane and their shadow posi-tion using calculations similar to epipolar geometry instereo vision. In our case the light source and interac-tion surface provide similar information as the second


43

Figure 6: The light source is estimated as the intersec-tion of the lines formed by the tips of the index fingerand the corresponding shadow points. The shadowpoints lie in the interaction plane and the finger tipheight above the plane is determined by placing thehand flatly onto the interaction plane as illustrated infigure 5.

camera used in stereo vision. The set-up would notbe suitable for general applications due to problemswith shadows being frequently occluded. However,for touch surface style interactions our configurationprovides the required information since users sit op-posite to the camera, the light source is on the side,and the hand is usually approximately flat. The 3Dfinger tip position O is calculated as illustrated in fig-ure 7: A line is cast from the view point position Vthrough the shadow’s position S′ on the view plane.The intersection of this line with the interaction plane(obtained in the calibration step) is the shadow’s po-sition S. The unknown 3D position of the point Ois given by the intersection of the line from S to thelight source L and the line from the view point Vthrough O′, the projection of O onto the view plane.

In our application the points O and S are the fingertip positions and their shadows, which are calculatedas explained in subsection 4.2. Note that the abovecalculation can be applied to any object for which acorresponding shadow can be identified. For example,figure 2 shows the use of a pen as interaction device.

Once a point can be tracked in three dimensions,the detection of touches can be defined by specifyinga threshold for the distance between the point andthe interaction plane. This is necessary, since a fingerhas a non-zero thickness. The threshold for this wasdetermined experimentally.

Figure 7: The setup of our virtual touchscreen.A light source illuminates the hand, which casts ashadow onto the interaction plane. The position of a3D point O can be computed using the light sourceposition L, the web cam (view point) position V , andthe projections O′ and S′ of O and S, respectively,onto the view plane.

5 Implementation

Our application was developed in C/C++ using theOpenCV library (OpenCV 2011) for camera calibra-tion, colour space conversion, pixel colour averaging,and image segmentation. We also used the “Geomet-ric Tools” library (Eberly 2011), particularly for thedefinition of the desk plane.

5.1 Segmentation

For image segmentation we used the OpenCV func-tions cvAcc and cvAbsDiff to accumulate and cal-culate absolute differences for images. The re-sults of these operations are image data structures(IplImage) with the same dimensions as the inputimages. The computation of averages is accomplishedby using cvConvertScale to divide by the number ofcaptured images.

All subsequent threshold values refer to 8 bit num-bers (0 to 255) for the corresponding colour channels.In order to place thresholds above and below the av-erage pixel values, the average difference values arescaled and subtracted/added to the average. Typicalscaling values for setting the thresholds above andbelow the average are 7.0 and 6.0 respectively, re-sulting in a range of approximately a 2.5% on eitherside. Any value falling outside these thresholds is con-sidered foreground and anything which falls betweenthese thresholds is considered background. For theshadow detection RGB colours were converted intoHSV colours using cvCvtColor. Typical thresholdsfor the value and saturation channels were 5.0 and2.0, respectively.

Pixel classification was achieved using thecvInRange function, which outputs monochromaticimages, which can be used as masks for subsequentoperations. For example, inverting the shadow maskusing cvNot and combining it with the backgroundmask using cvAnd creates a mask for the hand (allpixels which are not background and not shadow).

Noise in the image data resulted in some pixels be-ing misclassified. We removed it by applying a Gaus-sian smoothing with a 9x9 window to the hand maskand shadow mask. Holes in the hand region were filledusing a morphological operator.

5.2 Computation of 3D Finger Tip Positions

In subsection 4.4 we explained that the 3D positionof finger tips is defined as the intersection of tworays. Due to noise, calibration errors, and numeri-cal inaccuracies the two rays are unlikely to intersectin practice. Instead we use an algorithm from PaulBourke (Bourke 2011) to find the point where the dis-tance between the two rays is minimal.

Given the 3D positions of points of interest andthe definition of the desk plane touch interactions aredefined by instances where the distance between 3Dpoint and shadow point falls below a given threshold.We determined experimentally that using a thresholdof 10mm gave best results, which represents approx-imately the thickness of a finger. We use a coordi-nate transformation matrix for easy conversion be-tween the camera and desk coordinate system.

6 Results

We evaluated our prototype by investigating the influ-ence of different parameters, by creating three proof-of-concept applications, and by an informal user test.We compared the interface to traditional mouse andtouch interfaces and to the “Shadow Gestures” paper,the closest related research.


44

6.1 Experimental Setup

As light source we used an ordinary ceiling light withan 11W fluorescent bulb. This is not a perfect pointlight source and resulted in a soft shadow. The cur-tains of the room were closed resulting in a low ambi-ent light, without specular reflections from sunlight.We used a Logitech QuickCam Pro 9000 web camwith a resolution of 640x480 pixels. The configura-tion of the camera, desk, and light source for develop-ment and testing is shown in figure 8. The interactionsurface was a portion of a desk, which was slightlyglossy and prone to some measure of reflection. Italso produced a sizable highlight as shown in figure 2.Tests with more matte surfaces improved results, butadding constraints in form of allowable surface prop-erties was deemed too restricting for practical appli-cations.

Figure 8: The virtual touch screen set-up employedduring development. Left: The positions of the pointlight source, desk and camera. Right-bottom: The in-teraction surface (desk) with camera and calibrationgrid. Right-top: The orientation of the camera withrespect to the monitor and desk.

All computations were performed on a PC withan Intel Core i5 750 2.67 GHz CPU, 4 GB 1333 MHzRAM, and an Nvidia GeForce GTX 460 graphics cardwith 768 MB GDDR5 and 336 CUDA cores. Notethat no hardware acceleration was used for the com-puter vision algorithms and that the CUDA toolkitfor OpenCV was not installed. Hence the perfor-mance on other systems should scale approximatelylinearly with CPU performance. With the above con-figuration we achieved real-time performance of above20 frames per second.

6.2 Influence of Parameters

6.2.1 Thresholds

Initial tests showed that the utilised segmentationtechnique worked well over non-uniform static back-grounds for stable lighting environments. The thresh-old values described in the previous sections requiredcareful tuning. We developed an interface for simpli-fying this task, but it is still a manual process, whichneeds to be repeated if the setup is changed. Ideallythis process should be automated, e.g. by performinga sequence of predefined hand postures and gestures.

6.2.2 Environmental Parameters

The background needs to be static and hence cannot be used in environments where people or objectsare moving around, which might partially obscure thebackground or change the lighting environment (e.g.,cast a shadow on the desk). We haven’t developed

applications with two-hand input, but as long as thehands and their shadows are separated, we can’t seeany reason why this should not work. Our modelassumes that the foreground colours vary from thebackground colour, hence it would not work for set-ups where the desk and user’s skin colour are similaror where the background’s material does not showvisible shadows.

The lighting environment needs to be a point lightsource. We found that the application still works fora ceiling lamp generating slightly soft shadows, butit would not work for large fluorescent lights. Thealgorithms were designed such that the applicationwould also work in an outdoor environment using thesun as point light source. In this case the light sourcewould be positioned at infinity resulting in an orthog-onal shadow projection. This will have to be testedin future work.

6.2.3 Calibration Parameters

The calibration of the desktop plane uses well-testedalgorithms and no problems were noticed. The cal-ibration of the light source, described in subsec-tion 4.3.3, uses several simplifications. The fact thatthe height h in figure 6 is only a rough estimationof the actual height introduces errors when comput-ing the depth values from the positions of the fingersand theirs shadows. However, this was not consid-ered a problem, since the errors do only effect ac-tual depth estimates. For the applications we tested,such as moving or extruding an object in 3D, rela-tive height changes are sufficient. In order to see why,note that the traditional mouse interface uses distancescaling dependent on accelerations (Microsoft Corpo-ration 2002). This is considered intuitive despite re-sulting in a non-linear behaviour and the same physi-cal mouse position during interactions correspondingto different screen positions. Our tests indicate thatthe main factors for usability are that the user canperceive 3D cursor motions, their relationship to handmotions, and that their effect on the 3D applicationis predictable.

6.3 Proof-of-Concept Applications

6.3.1 Single-Touch Interface

Our first proof-of-concept application is a sketch-typeinterface using the most outward point of the hand asa single interaction point. This can be a finger tip,the tip of the entire hand, or a stylus as explained insubsection 4.2.1 and illustrated in figure 1 and 2.

Figure 9 demonstrates that hand and shadow de-tection can be used to successfully detect contact withthe interaction surface and subsequent finger motions.The hand tip and its shadow on the desk surface aremarked with white points. When contact is detectedthe point colour becomes green. The 3D coordinatesdisplayed in the figure show the 3D position of theinteraction point. It can be seen that during con-tact the depth value can vary slightly due to noise, achanging finger angle, and compression of the fingertip.

Touch detection is sufficiently accurate to drawcomplex shapes as shown in figure 10. Some jitteris visible partially due to finger motions and partiallydue to noise in the detected finger position.

6.3.2 Multi-Touch Interface

The second proof-of-concept application explores thepossibility of multi-touch interactions using multiplefinger tips. Figure 11 shows that as long as the handshadow is fully visible 3D positions can be tracked for


45

Figure 9: Examples of a sketch-application. Thesketch starts as soon as the finger touches the inter-action surface and tracks the finger motion until thecontact stops.

Figure 10: Different shapes drawn using differenthand gestures with a single interaction point. Shapescan be drawn by connecting multiple contact pointswith line segments (left), or by a continuous fingermotion (middle and right).

all fingers. The association of finger tips and shadowpoints is achieved by minimising the distances be-tween each pair of points. If the hand is rotated orthe hand overlaps the shadow region the finger tipdetection can fail for the shadow region. In that casethe 3D position of the corresponding finger tip can-not be computed. Improvements might be possibleby using different finger tip detection techniques orby using a hand model and computing a 3D configu-ration matching both the hand and shadow image.

Figure 11: Tracking of the 3D position of multiple fin-gers (left). The displayed numbers give the height ofeach finger tip in millimeters. The green lines connectfinger tips with the corresponding shadow points. Ifthe hand moves closer to the interaction plane thedetection of convex defects fails and the 3D positioncan not be computed (middle and right).

6.4 3D Modelling Application

The third proof-of-concept application demonstratesa 3D modeling application. 3D extruded objects arecreated by touching the interaction surface, movingthe finger to draw a 2D shape, and extruding it bylifting the hand. For the configuration in figure 8 themaximum possible extrusion height was about 18cm.The current application extrudes the raw 2D sketcheswithout modification. Spline curves could be usedto smooth the sketch input and OpenGL tessellationalgorithm could be used to create a closed 3D shape.

6.5 Informal User Test

We tried our application with five users unfamiliarwith the project and aged between 7 and 39 years.The participants were asked to use the sketching ap-plication without any explanations or introduction tothe program. All users exhibited what appeared to bea high level of enjoyment, and there was some com-petition among the younger users (7 and 11 years

Figure 12: Examples of a 3D modelling application:3D shapes are created by drawing a 2D shape using asingle-finger gesture on the surface, and then extrud-ing it by lifting the hand.

old) for time using the program. Finger trackingperformed as expected and described in the previ-ous subsections. Difficulties arose as users assumedmultitouch capabilities and multi-user support wouldwork. Furthermore, as multiple users crowded aroundthe desk, blocking of the light source became an is-sue as standing peoples heads came between the lightsource and the desk. Also the sometimes inconsis-tent touch registration on parts of the surface nearthe centre of the highlight and in the darker regionsto the right were found to be frustrating.

Users did not seem to have any intuitive grasp ofwhat areas were likely to respond to touches betterthan others. User reactions indicated that the mostimportant improvement for future work are multi-touch support and a more robust touch registration.

6.6 Comparison with Traditional Interfaces

The most popular traditional positional input devicesare mice and touch screens. Similar to mice our in-put device does not allow direct interaction with thedisplay, but uses instead a cursor moved by the handmotions. This maximises the visible screen space, butintroduces a level of indirection.

In contrast to both mice and touch screens we caninput 3D positions. However, we feel that the mouseinterface is by far the most ergonomical one. Alsothe mouse interface seems to be most precise and al-lows scaling of motions using accelerations. Whilethis could also be implemented for our interface, theimplementation is not as straight forward since thestart and end of motions would have to be indicatedeither by hand gestures or surface contacts.

While our interface allows multi-touch interactionsin principle, we found this difficult in practice dueto self-occlusions. Similar to mice our interface isportable and is independent of the display device.Like most interfaces a smooth motion, e.g. drawinga smooth curve, is difficult without postprocessing.We did not make any formal tests of the achievableprecision, although our informal tests indicate thatin terms of point selection our interface is less precisethan a mouse.


46

6.7 Comparison with “Shadow Gestures”

We found only one other paper using hand gesturesand their shadows for 3D interactions (Segen & Ku-mar 1999). In contrast to our research the paper putsmany more constraints on the environment includinga uniform background, pre-calibrated cameras, anda calibration step for the light source, which is notdescribed in the paper. While the authors mentionthat they compute 3D points, most interactions use2D information and surface touches are not detected.

In contrast to our goal of extending touch screeninteractions, Segen and Kumar use the shapes formedby two fingers to define different gestures. The capa-bilities of the resulting gesture interface are demon-strated with different example applications, such asmanipulating 3D objects or steering a plane through a3D terrain. This seems to be predominantly achievedby the orientation and shape of the two fingers, ratherthan computing accurate 3D position. The authorsmention that their system fails if the hand moves closeto the table.

7 Conclusion

The mouse and keyboard combination of input de-vices has dominated the desktop computing interfacefor decades. The explosion of portable computingdevices such as iPhone and iPad has made intuitivetouch and motion controls popular and they are nowslowly finding their way to the desktop platform.

An analysis of the strengths of the mouse and key-board configuration and the weaknesses of currentapproaches to touch interfaces motivated a new ap-proach combining the advantages of both of these in-terfaces. In this paper we attempted to develop suchan interface using inexpensive hardware (webcam andlight bulb).

We have presented a prototype of a 3D virtualtouch screen which uses a web cam to track the posi-tion of the user’s fingers and their shadows and fromthis computes 3D points. We showed that the systemworks well for single touch inputs, but problems ex-ist for multi-touch interactions due to finger shadowsbeing partially occluded.

In contrast to previous work by Segen and Ku-mar (Segen & Kumar 1999) we provide a more sta-ble tracking technique allowing a wider range of en-vironmental conditions and simpler calibration steps.Most importantly we move away from gestures andtowards touchscreen emulation, and hence simulateand extend an input mechanism that is already beingwidely adopted outside the desktop platform.

Another advantage of our application is that it canbe used in combination with a traditional mouse in-terface. While mouse interactions occur our interfacecan be disabled, and when the mouse is moved aside(not visible on the interaction surface) the 3D touchcapabilities can be enabled.

8 Future Work

Two key categories of work remain ahead. First,segmentation of shadows and hands from the back-ground must be improved. Second, occlusion of rel-evant shadow information must be dealt with. Forshadow detection and background subtraction, a sys-tem which automatically adjusts to different back-grounds is necessary. Thresholds based on the light-ness or darkness of the background on a per pixel ba-sis should be produced. Additionally improved handdetection is required, e.g., by using skin colour detec-tion (Kakumanu et al. 2007, Vassili et al. 2003, Liuet al. 2011). The tracking and prediction of finger

and shadow movements should also be investigatedas a method for dealing with transitory occlusion pe-riods.

In previous work we developed “LifeSketch”, anapplication for the sketch-based modelling of interac-tive 3D environments (Yang & Wunsche 2010, Guan& Wunsche 2011, Olsen et al. 2011, Schauwecker et al.2011). The interface presented in this paper will sig-nificantly simplify many of the presented modellinginteractions. For example, we showed that complexbuildings can be modelled from 2D outlines, but oftenadditional 3D parameters such as extrusion depth arerequired (Olsen et al. 2011). Similarly animating 3Dobjects is cumbersome by using only 2D touch/sketchinteractions (Schauwecker et al. 2011).

8.1 Future Applications

The applications so far developed and described inthis paper only show a small fraction of what theinput mechanism produced here can accomplish. Ex-pansion of application capabilities should be a futurepriority. In June 2011, Microsoft gave the first officialpublic viewing of Windows 8. The default interfacefor the world’s most popular operating system is be-ing designed from the ground up to be touch-based.Microsoft currently states that a mouse and keyboardwill still work for interactions, but the design shouldcreate increased demand for low cost desktop basedtouch interfaces.

Our design works for any configuration using asingle point light source, which casts shadows ontoa roughly planar surface. We are hence interestedto try the system in outdoor environments using thesun as light source. Problems will occur when the sunstands low and the shadows are stretched. Also thesun’s position will move over time, which requires oc-casional recalibration. Two such calibrations shouldbe enough to interpolate and predict further positionchanges.

References

Argyros, A. A. & Lourakis, M. I. A. (2006), Binocu-lar hand tracking and reconstruction based on 2dshape matching, in ‘Proc. International Conferenceon Pattern Recognition (ICPR ’06)’, pp. 207–210.

Bourke, P. (2011), ‘Geometry, surfaces, curves, poly-hedra’. http://paulbourke.net/geometry.

Bradski, G. & Kaehler, A. (2008), ‘Learning OpenCV:Computer vision with the OpenCV library’.

Chen, Q., Georganas, N. & Petriu, E. (2007), Real-time vision-based hand gesture recognition us-ing haar-like features, in ‘Instrumentation andMeasurement Technology Conference Proceedings,2007. IMTC 2007. IEEE’, pp. 1–6.

Chong, H. Y., Gortler, S. J. & Zickler, T. (2008),‘A perception-based color space for illumination-invariant image processing’, ACM Trans. Graph.27(3), 1–7.

Eberly, D. (2011), ‘Geometric tools’. http://www.geometrictools.com.

Erol, A., Bebis, G., Nicolescu, M., Boyle, R. D. &Twombly, X. (2005), A review on vision-based fulldof hand motion estimation, in ‘Proceedings of the2005 IEEE Computer Society Conference on Com-puter Vision and Pattern Recognition (CVPR’05) -Workshops - Volume 03’, IEEE Computer Society,Washington, DC, USA, pp. 75–82.


47

Erol, A., Bebis, G., Nicolescu, M., Boyle, R. D. &Twombly, X. (2007), ‘Vision-based hand pose es-timation: A review’, Journal of Computer Visionand Image Understanding 108, 52–73.

Gonzalez, B. & Latulipe, C. (2011), BiCEP: biman-ual color exploration plugin, in ‘Proceedings of the2011 annual conference extended abstracts on Hu-man factors in computing systems’, CHI EA ’11,ACM, New York, NY, USA, pp. 1483–1488.

Guan, L. & Wunsche, B. C. (2011), Sketch-Based Crowd Modelling, in ‘Proceed-ings of the 12th Australasian User Inter-face Conference (AUIC 2011)’, pp. 67–76.http://www.cs.auckland.ac.nz/~burkhard/Publications/AUIC2011_GuanWuensche.pdf.

Hardenberg, C. & Berard, F. (2001), Bare-handhuman-computer interaction, in ‘Proceedings of the2001 workshop on Perceptive user interfaces’, PUI’01, ACM, New York, NY, USA, pp. 1–8.

Homma, K. & Takenaka, E.-I. (1985), ‘An imageprocessing method for feature extraction of space-occupying lesions’, Journal of Nuclear Medicine26(12), 1472–1477.

Hsieh, C.-C., Tsai, M.-R. & Su, M.-C. (2008), Afingertip extraction method and its application tohandwritten alphanumeric characters recognition,in ‘Proceedings of the 2008 IEEE InternationalConference on Signal Image Technology and Inter-net Based Systems’, SITIS ’08, IEEE Computer So-ciety, Washington, DC, USA, pp. 293–300.

Kakumanu, P., Makrogiannis, S. & Bourbakis, N.(2007), ‘A survey of skin-color modeling and detec-tion methods’, Pattern Recogn. 40(3), 1106–1122.

Laptev, I. & Lindeberg, T. (2001), Tracking of multi-state hand models using particle filtering and a hi-erarchy of multi-scale image features, in ‘Proceed-ings of the Third International Conference on Scale-Space and Morphology in Computer Vision’, Scale-Space ’01, Springer-Verlag, London, UK, pp. 63–74.

Latulipe, C., Mann, S., Kaplan, C. S. & Clarke, C.L. A. (2006), symspline: symmetric two-handedspline manipulation, in ‘Proceedings of the SIGCHIconference on Human Factors in computing sys-tems’, CHI ’06, ACM, New York, NY, USA,pp. 349–358.

Lee, M. & Woo, W. (2003), ARKB: 3d vision-basedaugmented reality keyboard, in ‘ICAT’.

Liu, R. (2010), A framework for webcam-based handrehabilitation exercises, BSc Honours Disserta-tion, Graphics Group, Department of ComputerScience, University of Auckland, New Zealand.http://www.cs.auckland.ac.nz/~burkhard/Reports/2010_S1_RuiLiu.pdf.

Liu, R., Wunsche, B. C., Lutteroth, C. & Del-mas, P. (2011), A framework for webcam-based hand rehabilitation exercises, in ‘Pro-ceedings of VISAPP 2011’, pp. 626 – 631.http://www.cs.auckland.ac.nz/~burkhard/Publications/VISAPP2011_LiuEtAl.pdf.

Mahmoudi, F. & Parviz, M. (2006), ‘Visual handtracking algorithms’, Geometric Modeling andImaging–New Trends 0, 228–232.

Malik, S. & Laszlo, J. (2004), Visual touchpad: atwo-handed gestural input device, in ‘Proceedingsof the 6th international conference on Multimodalinterfaces’, ICMI ’04, ACM, New York, NY, USA,pp. 289–296.

Microsoft Corporation (2002), ‘Pointer ballistics forWindows XP’. http://msdn.microsoft.com/en-us/windows/hardware/gg463319.aspx.

Olsen, D. J., Pitman, N. D., Basakand, S. &Wunsche, B. C. (2011), Sketch-based build-ing modelling, in ‘Proceedings of GRAPP2011’, pp. 119–124. http://www.cs.auckland.ac.nz/~burkhard/Publications/GRAPP2011_OlsenEtAl.pdf.

OpenCV (2011), ‘homepage’. http://opencv.willowgarage.com/wiki.

Park, J. & Yoon, Y.-L. (2006), ‘LED-glove based in-teractions in multi-modal displays for teleconfer-encing’, International Conference on Artificial Re-ality and Telexistence pp. 395–399.

Schauwecker, K., van den Hurk, S., Yuen, W. &Wunsche, B. C. (2011), Sketched interactionmetaphors for character animation, in ‘Proceed-ings of GRAPP 2011’, pp. 247–252. http://www.cs.auckland.ac.nz/~burkhard/Publications/GRAPP2011_SchauweckerEtAl.pdf.

Segen, J. & Kumar, S. (1999), ‘Shadow gestures: 3dhand pose estimation using a single camera’, Com-puter Vision and Pattern Recognition, IEEE Com-puter Society Conference on 1, 1479.

SIMTICS Ltd. (2011), ‘SIMTICS Ltd. homepage’.http://www.simtics.com/.

Song, P., Winkler, S., Gilani, S. O. & Zhou, Z. (2007),Vision-based projected tabletop interface for fingerinteractions, in ‘Proceedings of the 2007 IEEE in-ternational conference on Human-computer inter-action’, HCI’07, Springer-Verlag, Berlin, Heidel-berg, pp. 49–58.

Stenger, B., Mendona, P. R. S. & Cipolla, R. (2001),‘Model-based 3d tracking of an articulated hand’,Computer Vision and Pattern Recognition, IEEEComputer Society Conference on 2, 310.

Stenger, B., Thayananthan, A., Torr, P. H. S. &Cipolla, R. (2006), ‘Model-based hand tracking us-ing a hierarchical bayesian filter’, IEEE Trans. Pat-tern Anal. Mach. Intell. 28(9), 1372–1384.

Terajima, K., Komuro, T. & Ishikawa, M. (2009),Fast finger tracking system for in-air typing in-terface, in ‘Proceedings of the 27th internationalconference extended abstracts on Human factors incomputing systems’, CHI EA ’09, ACM, New York,NY, USA, pp. 3739–3744.

Vassili, V. V., Sazonov, V. & Andreeva, A. (2003),A survey on pixel-based skin color detection tech-niques, in ‘Proc. Graphicon’, pp. 85–92.

Wang, R. Y. & Popovic, J. (2009), Real-time hand-tracking with a color glove, in ‘SIGGRAPH ’09:ACM SIGGRAPH 2009 papers’, ACM, New York,NY, USA, pp. 1–8.

Wang, X., Zhang, X. & Dai, G. (2007), Tracking ofdeformable human hand in real time as continuousinput for gesture-based interaction, in ‘Proceedingsof the 12th international conference on Intelligentuser interfaces’, IUI ’07, ACM, New York, NY,USA, pp. 235–242.

Yang, R. & Wunsche, B. C. (2010), LifeS-ketch - A Framework for Sketch-Based Mod-elling and Animation of 3D Objects, in‘Proceedings of the Australasian User In-terface Conference (AUIC 2010)’, pp. 1–10.http://www.cs.auckland.ac.nz/~burkhard/Publications/AUIC2010_YangWuensche.pdf.


48

Evaluating Indigenous Design Features Using Cultural Dimensions

Reece George, Keith Nesbitt, Michael Donovan, John Maynard University of Newcastle Callaghan 2300, NSW

{Reece.George, Keith.Nesbitt, Michael.Donovan, John.Maynard}@newcastle.edu.au

Abstract This study compares previous analytical findings in the area of cultural web design using Hofstede’s dimensions with findings from a three year case study. This case study used an ethnographic and user-centric approach to better integrate cultural requirements into the website for a specific Indigenous community. We overview this design process and describe the ten key design features that were identified in the project. These design features were considered essential for capturing the cultural identity of the community. They are relevant to designers of indigenous websites and designers considering culture as part of their interface design process. We evaluate these design features by considering them in terms of Hofstede’s cultural model. Some correlations have previously been found between Hofstede’s cultural dimensions and the structural and aesthetic design features that are used in websites from different cultures. We compare the ten design features identified from our case study with the outcomes we might expect, given the measured position of the group on Hofstede’s cultural dimensions. The best correlations occurred on the power distance index where the navigation, organisation and image content conformed with expectations. However, a number of contrary results were also found.. Keywords: Culture, Indigenous, Culturability, Web Design, Hofstede’s Model

1 Introduction Culture is fundamental to all areas of design, including

the design of user interfaces. Culture develops through social interactions that occur at various scales, from smaller community groups to entire nations (Hofstede 2005). Indeed many levels of culture may co-exist based on various social groupings associated with ethnicity, religion, language, generation, gender or work place (Hofstede 2005). While culture can be seen in behaviour, traditions, community values and aesthetics, many aspects remain hidden. Culture is made up of a large mass of social rules that, like an iceberg lie mostly hidden below the surface (French and Bell, 1979).

The effect of culture on the usability of an interface has been described as “culturability” (Barber and Badre 1998). Different cultural groups have been shown to employ quite different usage strategies with the same interface (Faiola and Matei 2004). A number of specific Copyright © 2012, Australian Computer Society, Inc. This paper appeared at the 13th Australasian User Interface Conference (AUIC 2012), Melbourne, Australia, January-February 2012. Conferences in Research and Practice in Information Technology (CRPIT), Vol. 126. H. Shen and R. Smith, Eds. Reproduction for academic, not-for-profit purposes permitted provided this text is included.

cultural factors that impact on usability have been described and these include, the use of language, the representation of time, currency and other units of measure (Fernandez 2000). More subtle cultural factors such as imagery depicting body positions or social contexts and the use of symbols and preferred aesthetics are also important (Fernandez 2000).

In terms of interface design two distinct approaches to treating culture have been described. One approach is known as "globalization" as it involves the development of a generic, culturally neutral design that can be deployed globally (Tixier 2005). By contrast, the term “localization” requires the designer to adapt the interface so it specifically targets the culture of the local group (Shannon 2000).

Localisation has been the focus of our own three-year case study. In particular we have been trying to understand the process of incorporating the cultural identity of an Indigenous Australian community into the design of a website (George et. al. 2010). To achieve this we have adopted an extended ethnographic design process that focuses on community involvement. The process has used focus groups, interviews and iterative prototyping to help identify important cultural requirements for the group. The outcomes from this study include a group of ten key design features that are considered essential for capturing the specific culture of our target group. We describe our process and these outcomes in more detail in the following sections.

Such an ethnographic design approach as ours, while considered essential to our own project aims, is no doubt time-consuming. A contrasting approach for considering cultural requirements is to first try and characterise the target group based on some well-defined ‘cultural’ measures. Unfortunately, agreeing on such cultural dimensions is problematic with as many as 29 different measures having been previously used in cross-cultural design (Scahdwitz 2008). In terms of interface design the most frequently cited model for such cultural measures is the one developed by the cultural theorist Geert Hofstede (2005). Hofstede’s cultural model is not without criticisms or debate (Ess and Sudweeks 2005, Callahan 2005) but it does provide a pragmatic, structured framework for studying culture (Williamson, 2002). We will describe Hofstede’s cultural dimensions in later sections and also discuss some of the criticisms of the model.

Having characterised the target users in terms of cultural dimensions the next step is to adopt appropriate patterns or guidelines that correlate well with their cultural measures. The Hofstede model is further relevant as there have been several studies that employ this model for studying the design of culture in websites (Callahan 2005, Marcus and Gould 2000, Robbins and Stylianou


49

2003, Singh and Pereira 2005, Yuan et. al 2005). Indeed many of these try to identify the key design features associated with different cultural dimensions. Of particular interest is a study that analysed university websites from eight different countries and reported on how well various design features correlated with Hofstede’s cultural dimensions (Callahan 2005).

It is the results from such analytical studies that we wish to compare our own results against. A distinguishing factor for our project is that we have identified design features through an extended ethnographic process and thus we have a good understanding of the motivation behind why each design element was chosen. This contrasts with the “blind” analysis of web site elements undertaken by these other studies. This is not a criticism of these studies as the blind analysis is a key part of their methodology. However, it does mean that any insights into why particular design elements were used remains unknown.

For the final phase of our project we addressed the question: “Given our target group, positioned using Hofstede’s cultural dimensions, how well do our identified design features match with previously reported expectations?” To answer this question we used Hofstede’s revised “Values Survey Module” from 2008 (VSM08 2008) to measure our target community’s cultural position along five dimensions. We then analysed our 10 key design outcomes in terms of these dimensions, comparing our outcomes with predicted outcomes. These predicted or expected outcomes were derived from the other studies that use Hofstede’s model.

The results are mixed with some design features, such as aesthetics and overall structure style matching quite well with expectations. Others, such as the use of interactive games are at odds with previous studies. Some features such as the use of humour in the site have not been previously studied. In general our findings support the idea that Hofstede’s model while a useful tool to help consider cultural requirements in interface design is in no way prescriptive or exact.

2 Project Description This work was motivated by a request to design a more culturally acceptable website for the Wollotuka Institute. The Wollotuka Institute is an indigenous study centre. It is part of the University of Newcastle, a large regional university about 170 kilometres north of Sydney, located in the traditional lands of the Awabakal nation. Wollotuka supports a broad range of Indigenous programs incorporating administrative, academic and research activities.

Wollotuka provides support and development services for Indigenous staff and students. It employs about 40 full time staff, who come from a wide range of Indigenous tribes. The community embraces a broad range of urban, regional and educational backgrounds. The diverse cultures of this community became the focus of a three-year case study that centred around the redesign of their website.

From this group 12 subjects were selected by convenience to be directly involved in the study. These 12 participants included five women and seven men who represent a range of Indigenous tribes, including the

Worrimi, Eora, Gumbaynggir, Bundjalung, Murray Island, Wirajuri woman, Wonnarua and Awabakal nations. The participants thus represent a range of Australian and Torres Strait Island Aboriginal culture rather than a single tribal perspective. Of these participants five have academic roles at the Institute while the other seven perform important administrative functions. Nine of them are graduates and four of these have post-graduate qualifications. The primary researcher is from an Indigenous tribe in North Western Australia, and was responsible for conducting, analysing and reporting on outcomes from the study.

One question addressed in the overall study was "What key design features should be incorporated into a website to meet the cultural requirements of this Indigenous group?" By “key design features” we mean general design factors dealing with the look and feel of the site. We did not set out to capture all the functions or user tasks to be performed on the site. Nor did we consider in detail strict technical limitations in the design, for example, network bandwidth or cross-platform browsing issues. The planned outcome of the project was not a fully functional and deployed website but rather a consensus about the critical design factors that were required to address the cultural requirements of the community.

We adopted a user-centric approach to design as the involvement of the community was seen as essential and this approach relies on the active involvement of representative users throughout the process (Nielsen 1993). In conjunction with this approach we used iterative prototyping as this is described as a key component for visualizing and evaluating design solutions with the users (Goransson et. al. 2003). To gather and refine the design requirements we used a range of qualitative methods that included a focus group and both structured and semi-structured interviews. The 12 selected subjects formally took part in the focus group, and provided feedback through one on one interviews.

Figure 1: Phases of the Project


50

Visual Imagery - The website should reflect the physical space occupied by the school. During the focus group, participants made significant references to the local landscape. For example, the school building is “our concept of place in a contemporary cultural environment.” Much care was given to the various aspects of the landscape, both animate and inanimate during the storytelling. For instance, the “dust in the car park”, the “sign out the front” and the “flagpoles” were referenced during the discussion.

Kinship - The website should make people feel part of an extended family.

A number of the focus group stories revolved around the relevance of kinship. One member went to the extent of comparing the Wollotuka environment to “a family unit and not like a school.” The members had clear notions of how this kinship feeling works and benefits the community: “the members of Wollotuka would make themselves available to help one another if they were in trouble.”

Language - The website should use a lot of multimedia (visual and spoken language) and be informal in style.

During the focus group a participant commented: “the website needs to speak the message rather than have written text.” Another confirmed this idea: “There needs to be more than just writing about indigenous people, there needs to be elements that identify Aboriginal people.” We must also show respect for the elderly and for those members of the community who were not able to read.

Humour - The website should capture the good humour and light-heatedness of the group. It should be a fun place to visit.

The stories in the focus group were noted for the humorous content. For instance, when commenting on the design of an Aboriginal website, one participant suggested “putting the name ‘Wollotuka’ up in pink neon lights across the top of the building, like the ‘Hollywood’ sign, saying ‘Wollyworld is here’ with a big arrow.”

Community feeling - The website should be an extension of the community and make people feel welcome and supported.

Much emphasis was placed on the need for community spirit. Participants remembered typical examples. One staff member recalled large groups of people from Wollotuka going out together. The group included lecturers from other faculties and students. They would play a game of pool and have a meal. People wanted to be a part of Wollotuka because of this community spirit.

Traditional Activities - the website must showcase many traditional activities, such as painting, music, dance and ceremony.

Quite some time was spent on traditional activities. Significantly, music, dance and ceremony were spoken of in relation to creating life. Singing and dancing are related to community spirit and also a fundamental to the sharing of Aboriginal knowledge.

Table 1: Cultural Themes from Focus Group

The design process contained three main phases (see Figure 1). The initial requirements phase was used to gain an understanding of pertinent cultural design issues and to gather the initial expectations of the community. As part of this phase a number of cultural design guidelines were identified from literature (George et. al 2010). These guidelines were later used to help inform design decisions. The focus group, involving the 12 subjects in our study was based on approaches from audience ethnography and involved story telling (George et. al. 2011). The focus group was used to identify the key

cultural themes within the community and these are summarised in Table 1.

Having identified potential design issues from the literature review and appropriate themes from the focus group we began the iterative prototyping phase. Researchers in the project met to discuss possible design features that would address the cultural requirements. The researchers were already familiar with many technical aspects of web design and implementation. Some initial design features were chosen to be prototyped. These prototypes were intended to act as props for gathering feedback about the design features.

After the first prototype was built it was shown to the subjects in the study who then provided feedback in one-on-one interviews. The interviews were semi-structured to focus particularly on design aspects although they also served to identify additional content information. For example, what types of images should be used. After considering this feedback the design features were refined in a further prototype stage. Once again the subjects used the prototype before providing feedback through one-on-one interviews. After two prototyping stages a list of key design features was finalised and these are described in more detail below.

The third phase of the project was designed to evaluate both the process we used and the design outcomes. The results reported here form part of that design evaluation.

3 Key Design Features The cultural requirements for the community were expressed in ten key design features:

1. Simple structure and navigation 2. Location Map 3. Virtual Tour 4. Multimedia (video and sound) 5. Interactive Games 6. Community Links 7. Feedback Mechanism 8. Informal Language and Humour 9. Traditional Imagery and Ceremony 10. Indigenous Wiki

3.1 Simple structure and navigation The first prototype was designed to be just a single homepage with a very basic layout avoiding too many links. Navigation was intentionally kept simple by using a menu with only six items placed at the top of the page (see Figure 2). It was intended that content of the page was browsed by scrolling rather than through targeted selection of menu items. The selected layout supports a more holistic style of reasoning, one associated with contextual, experience-based knowledge (Dong and Lee 2008).

3.2 Location Map A navigable satellite image map was also included in

the design (see Figure 3). This design feature is intended to reinforce the spatial location of the group, a need that emerged quite strongly from the focus group. The identity of the place and the ease with which someone could find it were considered crucial in building the identity of the institution. Spatial aspects like location have previously


51

been identified as significant in aboriginal Indigenous culture (Turk & Trees, 1998).

Figure 2: Prototype showing a simple menu, earthy colours, handwritten fonts and informal language.

Figure 3: Satellite Image map

Figure 4: Virtual Tour of Birabahn

Figure 5: Video Introductions

3.3 Virtual Tour We also identified the need to provide a virtual tour of the building and surroundings. Once again this was intended to reinforce the landscape where the community is situated. The tour was intended to encompass both the building itself and the external grounds. The grounds include a ceremonial site used by the community. Geographical features are said to form the foundation of Indigenous thinking (Auld 2007) and navigation by images is preferred over navigation linked to words (Williams 2002)

The results from the first prototype confirmed that this virtual tour was a key requirement of the group. The intention was to allow visitors to the site to experience the landscape. For the second prototype the virtual tour was expanded and as much functionality as possible was situated within the building and surrounds. So navigating the building became a navigation of the web site. We note the success of a similar approach in a project called ‘Digital Songlines’, which represented traditional Indigenous knowledge using a landscape metaphor (Pumpa, Wyeld, and Adkins 2006).

3.4 Multimedia (video and sound) The participants were unanimous in wanting

interactive images, “video, things happening, things moving”, and not just images. This concurs with more general cultural guidelines that recommend providing multimedia rich environments rather than text-based ones and also including a range of audio and visual media (Buchtmann 2000, Fischer 1995). To achieve this we include a number of interactive elements. In particular we used video introductions from the academics in the school (see Figure 5). These were situated within the building to reinforce the connections with the location. Videos of traditional elders were also used to introduce visitors to the site.

3.5 Interactive Games To provide further multimedia content, while addressing the requirement that visitors to the website would see the school as a fun place to study; we developed some casual interactive games. Two games were included (see Figure 6). One was a simple puzzle game based on card matching (Rosenzweig 2011 pp79-117). In this game local wildlife and Indigenous art were incorporated as elements into the game. Players scored points by turning over two cards with matching images. A second, action game, based on the traditional gameplay of “Asteroids” (Rosenzweig 2011 pp239-262) was also included. In this game traditional colors and imagery were combined with informal language and humorous overtones. Instead of avoiding asteroids hitting their space ship, players had to ensure kangaroos did not crash into their ute.

These game elements were intended to provide a strong message about Wollotuka being a “fun” place. There was also a desire within the group to try and appeal to the younger generation of visitors through elements such as interactive games. Once again the need to provide multimedia rich environments to encourage usage in Indigenous sites has previously been described. (Fischer 1995, Buchtmann 1999)


52

Figure 6: Two interactive games incorporating Indigenous images, informal language and humour

Figure 7: Community Links

Figure 8: Feedback Mechanism

3.6 Community Links Many of the design features can also serve to highlight

community and kinship. Traditionally, in an Aboriginal community, family life and children always come before individual pursuits (Gibb, 2006). This same theme was identified in the focus group and also reinforced by the feedback obtained during prototyping. For example, the group consistently reinforced the need to use images that represented both traditional elders and group images of the students and staff connected to Wollotuka. The use of familiar images depicting local scenes and people is a recommended technique for reinforcing the concept of community (Williams, 2002).

There was also a strong request to not only include traditional elders but also other Indigenous role models from sport and music. This met a concern that the site be relevant across multiple generations of visitors. Images of the entire cross-section of the community, from students to faculty, both young and old were requested. Explicit links to and for the broader community were also included. These sites contained information of general relevance for the Indigenous community (see Figure 8). However, we note that many of these links might not be considered relevant for a more traditional university website.

3.7 Feedback Mechanism During the design of the website we focused on a close

personal collaboration with group. This approach was seen as essential to the outcomes of the project. We wanted to ensure this sense of collaboration continued to occur and was also extended to all users of the website. Thus we incorporated a feedback mechanism into the website (see Figure 8).

Even though this is a common element on many websites we need to emphasize that it was considered of special significance in this design. Consultation with an Indigenous community has been recognized as a continuous two-way process (AIATSIS 2000) and so a feedback system was an important way to encourage the sharing of ideas among the extended website community.

3.8 Informal Language and Humour The importance of adapting language to local styles is

suggested for localization (Amara and Portaneri 1996, Callahan 2005). In particular, Aboriginal students are said to often prefer simple, “straight to the point” and easy to read English (Gibb 2006). However, our choice of informal language and humor (see Figure 2,5,6,8) is somewhat at odds with the more traditional image projected by university websites. Indeed there were some disagreements within the group over the use of “uneducated” phrasing and the simplified expressions that were incorporated into the design. However, in general, the sense of informality and fun associated with the school were considered more important than the reinforcement of academic reputation. Therefore the intent in the design was always to keep the language very informal and simple. This informality was reinforced by the use of a casual, handwritten font.


53

3.9 Traditional Imagery and Ceremony The design incorporated custom dot images and earthy colors that are strongly identified with traditional Indigenous culture (see Figure 2). We are aware that simple things, such as color, can affect the user’s expectations and overall satisfaction (Barber and Badre 1998). Likewise, the focus group wanted to see Aboriginal art on the website as it would immediately identify the site as Indigenous. Other traditional elements such as singing and dancing are often used to help teach in the traditional Aboriginal society (Fischer, 1995).

3.10 Indigenous Wiki The intention of a wiki for knowledeg construction was the most novel design element we tried to include in the website. Much has been written about the need to incorporate contemporary Aboriginal knowledge into such projects and to stress the involvement of the Indigenous people in the development of this knowledge. The idea is that the user should be able to ‘perform knowledge’. That is, to actively participate in knowledge construction, rather than merely accessing and manipulating what is provided (Pumpa and Wyeld 2006). We had identified a similar requirement from our focus group.

However, the whole question of how to represent and evolve knowledge in an “Indigenous” versus “Western” way proved a complex question that still requires more investigation. While the wiki approach used in the first prototype was thought to be a good idea, the text-base interaction was perceived as complex and difficult for non-technical users. To address this we expect a more visual representation, much like a graphical Multi-User Dungeon (MUD), which focuses on situated visual objects, could provide a solution. A MUD provides an extensible database of people, places and things that users can interact with (Woodruff and Waldorf 1995). In the scope of this project we did not develop this design feature further.

4 Hofstede’s Model The original Hofstede model was composed of four distinct dimensions that were said to categorize national culture (Hofstede 2005). After some criticism derived from a comparison of values for cross-cultural students (Hofstede and Bond, 1988) a further dimension was added (see Table 1). Thus five cultural dimensions were defined:

• Power Distance • Individualism • Masculinity • Uncertainty Avoidance • Long-term Orientation.

4.1 Hofstede’s Cultural Dimensions The power distance index is related to the extent that

power is distributed in the culture’s society. Higher values indicate that power is exercised centrally from above, while lower values indicate a more even spread of power through all levels of society. In low power distance cultures, there is less distinction placed on the position in

a hierarchy. Management may be less revered and equality of decision-making may be expected. This extends to family situations were children are also treated as equals.

The individualism measure relates to the way larger, strong cohesive social groups function as opposed to smaller individual and tight family groupings. This is also described as individualism versus collectivism. We might typically associate Asian cultures with a lower value of individualism compared to western cultures such as America and Australia, where personal pursuits tend to override the achievements of the group.

The masculinity index is intended to estimate the way roles are distributed between genders in the culture. Female values were found not to vary greatly between cultures while male attitudes did. In high masculinity countries traditional distinctions between gender roles are enforced. Value is often placed on social recognition, competition and advancement. By contrast, in low masculinity cultures, male and female attitudes, roles and values can be very similar. Values surrounding quality of life, modesty and equality are more prevalent.

The uncertainty avoidance dimension indicates the culture’s tolerance for ambiguity and uncertainty. Cultures that measure low on this dimension place less emphasis on rules and regulations that attempt to enforce certainty. They accept less structure and are more tolerant of change.

The long-term orientation measure was added to the Hofstede model to measure the cultural importance placed on the future rather the past and present. Values such as thrift and perseverance are associated with a high long-term orientation while respect for tradition and meeting social obligations are important values for countries with lower measures on this index.

Cultural Dimensions

Cultural Group

Pow

er D

istance

Individualism

Masculinity

Uncertainty

Avoidance

Long-term

Orientation

Wollotuka 15 15 -5 18 -33 Australia (Indigenous)

80 89 22 128 -10

Australia (western)

36 90 61 51 31

Austria 11 55 79 70 Denmark 18 20 66 30 118 Ecuador 78 8 63 67 Greece 60 35 57 112 India 77 48 56 40 61 Japan 54 46 95 92 80 Malaysia 104 26 50 36 Netherlands 38 80 14 53 44 Sweden 31 71 5 29 33 United States 40 91 62 46 29

Table 1: Measures on the five Hofstede dimensions for a selection of cultural groups (Hofstede 2005).


54

We do note that Hofstede’s cultural model is not without criticisms, and some of these include the use of an initial sample made up of employees from a single company and then how well these relate to the national culture as a whole (Sondergaard 1994). Callahan provides a good review of this issue and the ongoing debate surrounding Hofstede’s model (Callahan 2005).

4.2 Hofstede’s Value Survey Hofstede’s cultural model was originally derived from a survey of work-related values. The survey was completed by staff working for subsidiaries of IBM across 50 different countries between 1967 and 1973 (Hofstede 2005). An improved version of the value survey was made generally available in 1982 and many of the studies using Hofstede’s work relate to this original survey.

In 1994, when the model was extended to include the long-term orientation dimension the values survey was also extended to include questions that measured values related to this new dimension. Even though later studies were carried out using this survey, data is not yet available for some countries along the fifth dimension.

The values survey was again revised in 2008 and questions related to two further dimensions were included for research purposes (Hofstede et. al. 2008). These two new values were related to self-effacement and indulgence. In this study we used this 2008 version of the values survey. It includes 28 Likert-scale questions, four for each of the seven dimensions and a further six questions that provide demographic information. Since the latest two dimensions are yet to be correlated in the model and little data exists in terms of web design we will not consider them further in the study.

At the end of the project we surveyed the 12 subjects directly involved in the study along with 9 additional subjects that work at Wollotuka. Having completed the survey we calculated the dimensions using the described approach for the 2008 values survey (Hofstede et. al. 2008). This resulted in low scores on all five of the Hofstede cultural dimensions. These calculated scores are shown in Table 1 along with scores previously measured or estimated for other national cultures. Note that scores are typically scaled to fall between 0-100, although lower and higher values are sometimes used when these dimensions are estimated. We have also chosen not to apply any scaling and thus negative values are shown for two of the dimensions for Wollotuka.

Our sample size of 21 is at the bottom limit of the recommend sample size for the value survey. We also note that this survey is intended to measure national cultural variations and not cultural differences between smaller groups. Ideally we would only compare results against similar schools from a university population rather than comparing against results obtained from a different employment sector, namely IBM employees. As a result some caution must be applied in interpreting the results in too quantitative a fashion. To address this we do not assume a well-defined metric when comparing scores. Rather we treat cultural dimensions in broad categories ranging from low to high. Our group fall into the low category for all five dimensions (see Table 2).

Having placed our group with low measures in each of the five Hofstede dimensions we then compared our ten

key design features with expected results. To do this we turned to previous studies that delineate the expected design features for cultures with low scores in the power index, individualism, masculinity, uncertainty avoidance and long-term orientation dimensions. Table 2 shows a list of low and high cultures for each of the Hofstede dimensions. We have selected countries that feature in previous studies (Callahan 2005, Dormann and Chisalita 2003). Table 2 also includes Australia, a previously estimated value for an Indigenous group in the Northern Territory and the results from the Wollotuka survey.

Score on Hofstede Dimension

≤ 20 [20,40) [40,60) [60,80) ≥ 80

Power Distance Wollotuka Australia Japan Ecuador Indigenous

Austria Netherlands USA Greece Malaysia Denmark Sweden India

Individualism Wollotuka Greece Austria Denmark Indigenous Ecuador Malaysia India Sweden Australia

Japan Netherlands USA

Masculinity Wollotuka Indigenous India Australia Japan Denmark Greece Austria

Netherlands Malaysia Ecuador Sweden USA

Uncertainty Avoidance Index Wollotuka Denmark India Austria Indigenous

Denmark Australia Ecuador Greece Malaysia Netherlands Japan Sweden USA

Long Term Orientation Indigenous USA Netherlands India Japan Wollotuka Australia

Sweden

Table 2: A list of low and high countries as measured on the Hofstede cultural dimensions.

5 Design Features and Hofstede A number of previous studies have used Hofstede’s dimensions to examine cultural variations in web site design. For example, a series of structural and linguistic guidelines for each of the cultural dimensions have been suggested (Marcus & Gould 2000). Using frequency counts a set of design features for each of the Hofstede's dimensions was identified in a study of 500 commercial web sites across several cultures (Robbins and Stylianou 2002, Robbins and Stylianou 2010). Hofstede’s model has also been explicitly used in the study of university websites across cultures. A study of university web sites found correlations between feminine values and the masculinity index in the low masculinity country of Netherlands and the high masculinity culture of Austria (Dormann and Chisalita 2003). When Indian and American university websites were compared differences in the design were measured in the three dimensions of uncertainty avoidance, individualism and long-term orientation (Rajkumar 2003).


55

Score on Hofstede Dimension

Power Distance

Less structured access to information & shallow hierarchies. Less focus on expertise, authority, and official logos. Fewer access barriers. Photos of students. Images of both genders. (Marcus and Gould 2000)

Images of public spaces and everyday activities. (Ackerman 2002)

Significant emphasis on social and national order in symbols. Access restrictions. Photos of faculty. Photographs of leaders and monumental buildings (Marcus and Gould 2000)

Images of monuments (Ackerman 2002) Symmetrically designed sites (Callahan 2005)

Individualism Include socio-political achievements Emphasize history and tradition. Emphasis on state of being. (Marcus and Gould 2000) Use of formal speech. (Rajkumar 2003)

Images of groups and older people. (Callahan 2005)

Frequent images of success. Personal information. Emphasis on action. (Marcus and Gould 2000) Frequent pictures of individuals. Direct address. Expression of private opinion. Individual success stories. (Rajkumar 2003) Images of individuals & young. (Callahan 2005)

Masculinity Emphasis on visual aesthetics. Support cooperation and exchange of information. (Marcus and Gould 2000) Images of people, laughing, talking or studying together. (Dormann and Chisalita 2002)

Multiple choices. Orientated toward relationships. (Ackerman 2002) Figurative images. Black & white (two tone) images Pictures of women. (Callahan 2005)

Focus on task efficiency. Navigation oriented toward exploration and control. Utilitarian graphics. Interactive elements like games and animations. (Marcus and Gould 2000) Emphasis on tradition & authority. Frequent images of buildings. (Dormann and Chisalita 2002)

Limited choices. Orientation toward goals. (Ackerman 2002) Highly saturated colour images Animated pictures. (Callahan 2005)

Uncertainty Avoidance More complex designs. Variety of choices. Long pages with scrolling. (Marcus and Gould 2000)

Abstract images. (Ackerman 2002) Fewer links. Vertical page layout. Abstract images. Pictures of students & people. (Callahan 2005)

Simple with clear metaphors. Restricted amounts of data (Marcus and Gould 2000)

Formal organization charts, rules, regulations, extensive legalese. (Rajkumar 2003)

References to daily life. (Ackerman 2002) Horizontal page layout. More pictures of buildings. (Callahan 2005)

Long Term Orientation Emphasis on allowing the user to accomplish tasks quickly. (Marcus and Gould 2000)

Few references to tradition. Emphasis on current events. Present clear strategic plans. (Rajkumar 2003)

Emphasis on tradition and history. Provide archives of early photos & images of founders. Make frequent references to the distant future. (Rajkumar 2003)

Table 3: A list of design features expected for low and high values for each Hofstede cultural dimension.

Green items were confirmed in our study, while red items are in disagreement with our outcomes.

The most thorough study in this area examined the similarities and differences between university websites from eight different countries (Austria, Denmark, Ecuador, Greece, Japan, Malaysia, Sweden, USA). (Callahan 2005). Callahan’s study examined home pages of 20 universities from each of the eight countries. These countries were selected to represent low and high values on each of the four cultural dimensions. The study looked for correlations between the way specific design elements were used and the four original Hofstede dimensions. We have summarised the outcomes from each of these previous studies in Table 3 by showing the expected design features for low and high values on each of the cultural dimensions. In the following section we compare the own ten key design features in terms of these previous expected results.

5.1 Power Distance Dimension There are a number of good matches between our design features and the expectations of cultures with a low power index measure. These include the unstructured and shallow navigation hierarchy of our design. The less formal, authoritarian approach to layout and content is also as expected and contrasts with the existing official homepage. The types of images requested focus on public spaces and include both genders. There was a request to try and balance the use of both faculty and student images and this contradicts the expectations for the preferred use of student images. This might be explained by the intention of the site to introduce staff members to prospective students. This is accomplished through videos of staff. Another contradiction to expectations is related to the desire to emphasise Indigenous symbols, by use of the Indigenous flag and recognised aesthetics related to colour and abstract dot paintings.

5.2 Individualism Again there were some good matches between our design features and the expectations of cultures with a low individualism measure. The emphasis on traditional representations and ceremony in the design was confirmed. There was also a request to include as many community-based photos as possible, although there was no strong preference for younger or older people but rather that cross-generational images are equally represented. Once again personal information of individual faculty were part of the task requirements of the site and the intention was that these be a less formal more real world interaction. Individual success stories were included with the very clear intent of providing role models for younger people. The largest disagreement between our own design and expectations was the use of informal language on the web site. This was one design feature that received much debate during the design process. In the end the intention was to use this to make the web site more generally accessible to the community and less authoritarian.

5.3 Masculinity The best two matches between our design features and the expectations of cultures with a low masculinity measure were the strong emphasis on simple aesthetics,


56

namely the earthy colours and simple font. The overall emphasis on community relationship building and inclusive features such as the feedback are also predicted by previous studies. However, there are also a number of design features that have been suggested belong to high masculinity cultures. These include navigation oriented toward exploration, images of buildings, an emphasis on tradition and interactive elements like games and walkthroughs. The reason for including computer games in the site was to appeal to a younger generation and project an image of fun. The disparity with the other features relate to fundamental ideas of knowledge representation in Indigenous. This involves the exploration of things in the context of places, including landscapes and buildings in this case.

5.4 Uncertainty Avoidance Our website featured a single long pages with scrolling. This matches well with previous expectations for cultures with a low uncertainty measure. Other features such as the vertical page layout the use of abstract images and the inclusion of community pictures involving students also match expectations. However a few expected design features are at odds with a low uncertainty avoidance culture. For example, we might expect a more complex design with a large variety of choices. By contrast the simple, clear metaphors used in our design have been suggested are more appropriate for cultures that score high on this dimension.

5.5 Long Term Orientation Long Term orientation is the least studied of the Hofstede dimensions. There were no good matches between our design features and the expectations of cultures with a low score on this dimension. It is suggested that websites for low scoring cultures will focus on fast efficient task execution. However, our web site promotes a slower explorative navigation by way of the interactive media. There was also a strong preference in our design to focus and highlight traditional practices. This is directly at odds with expectations from literature.

6 Conclusion Hofstede’s dimensions have previously been used a number of times to analyse the impact of culture on web design. In all these cases the websites were evaluated blindly, with no specific knowledge about how culture actually impacted on the design decisions. Our study is therefore unique in that the design features were first identified using a protracted ethnographic process that was intended to capture elements that best represented the culture of the group. It was at the conclusion of the project that we measured the group’s cultural position using Hofstede’s cultural dimension survey. This allowed us to compare the actual outcomes in terms of the web design with the expected outcomes as suggested by previous work. The question we addressed was “For our group, as positioned using Hofstede’s cultural dimensions, do the identified design features match with expectations that have been previously reported?”

The answer to this question was mixed. Our group measured low on the five Hofstede dimensions and some of our design features correlated well with expected

outcomes. The best correlations occurred on the power distance index where the navigation, organisation and image content conformed with expectations. However, a number of contrary results were also found. In particular the use of informal language with a low individualism score and the focus on tradition with low a long-term orientation.

As previous authors have indicated (Ess and Sudweeks 2005), while design features associated with the Hofstde dimensions provide useful input to the cultural design process they do not provide straightforward definitive design solutions. Rather variations in individual groups still need to be catered for in the design process.

7 References Ackerman, S. K., (2002): Mapping User Interface Design

to Culture Dimensions. Paper presented at International Workshop on Internationalization of Products and Systems, Austin TX, USA.

AIATSIS. (2000): Australian Institute Of Aboriginal And Torres Straight Islander Studies. Guidelines For Ethical Research In Indigenous Studies. Canberra.

Amara, F. and Portaneri, F. (1996): Arabization of graphical user interfaces. In International user interfaces. 127-150. del Galdo, E. and Nielsen, (eds), , Wiley, New York.

Auld, G. (2007): Talking books for children's home use in a minority Indigenous Australian language context', Australasian Journal of Educational Technology. 23(1): 48-67.

Barber, W. and Badre, A., (1998): Culturability: The merging of culture and usability. In Proceedings of the 4th Conference on Human Factors and the Web.

Bell, P., (2001): Content analysis of visual images. In Handbook of Visual Analysis. 92-118. van Leeuwen & Jewitt, C. (eds.), London: Sage.

Buchtmann, L., (1999): Digital Songlines - The Use of Modern Communication Technology by an Aboriginal Community in Remote Australia. University of Canberra. Canberra, Australia,

Callahan, E., (2005): Cultural similarities and differences in the design of university websites. Journal of Computer-Mediated Communication. 11(1).

Dormann, C., and Chisalita, C. (2002): Cultural Values in Website Design. In Proceedings of the 11th European Conference on Cognitive Ergonomics.

Dong, Y. and Lee, K.P., (2008): A cross-cultural comparative study of users' perceptions of a webpage: With a focus on the cognitive styles of Chinese, Koreans and Americans. International Journal of Design, 2(2):19-30.

Ess, C., and Sudweeks, F., (2005): Culture and computer-mediated communication: Toward new understandings. Journal of Computer-Mediated Communication. 11(1).

Faiola, A. and Matei, S.A. (2005): Cultural cognitive style and web design: Beyond a behavioral inquiry into computer-mediated communication. Journal of Computer-Mediated Communication, 11(1).


57

Fernandez, N.C. (2000): Web Site Localisation and Internationalisation: a Case Study. City University.

Fischer, R.A., (1995): Protohistoric roots of the network self: On wired aborigines and the emancipation from alphabetic imperialism. Zurich.

French, W.L., and Bell, C.H., (1979): Organization Development: Behavioral Science Interventions For Organization Improvement. New Jersey: Prentice Hall.

George, R., Nesbitt, K., Gillard, P., and Donovan, M., (2010): Identifying Cultural Design Requirements for an Australian Indigenous Website. In Conferences in Research and Practice in Information Technology, 106: 89-97. Calder, P., and Lutteroth, C. (eds). CRPIT.

George, R., Nesbitt, K., Donovan, M., and Maynard, J. (2011): Focusing on Cultural Design Features for an Indigenous Website. In proceedings of 22nd Australasian Conference on Information Systems. Sydney.

Gibb, H., (2006): Distance Education and the Issue of Equity Online: Exploring the Perspectives of Rural Aboriginal Students. The Australian Journal of Indigenous Education. 35:21-29.

Goransson, B., Gulliksen, J., and Boivie, I. (2003): The Usability Design Process – Integrating User-centered Systems Design in the Software Development Process. Software Process Improvement and Practice, 8:111-131.

Hofstede, G., (2005): Cultures and Organizations: Software of the Mind. 2nd ed., McGraw-Hill, London.

Hofstede, G. & Bond, M.H., (1988): The Confucius connection: from cultural roots to economic growth. Organizational Dynamics, 16(4):4-21.

Hofstede, G., Hofstede, G. J., Minkov, M., and Vinken, H. (2008): Values Survey Module 2008 Manual. http://stuwww.uvt.nl/%7Ecsmeets/ManualVSM08.doc Accessed 24 Aug 2011.

Marcus, A., and Gould, E.W., (2000): Cultural dimensions and global Web user-interface design: What? So what? Now what? http://www.amanda.com/cms/uploads/media/AMA_CulturalDimensionsGlobalWebDesign.pdf Accessed 8 Feb 2011.

Nielsen, J. (1993): Usability Engineering. Morgan Kaufmann, San Diego, CA.

Pumpa, M. and Wyeld, T.G., (2006): Database and Narratological Representation of Australian Aboriginal Knowledge as Information Visualisation using a Game Engine. In Tenth International Conference on Information Visualization. IEEE Computer Society.

Pumpa, M., Wyeld, T.G. and Adkins, B., (2006): Performing Traditional Knowledge Using a Game Engine: Communicating and Sharing Australian Aboriginal Knowledge Practices. In The Sixth International Conference on Advanced Learning Technologies. IEEE Computer Society Press.

Rajkumar, S., (2003): University Web Sites: Design Differences and Reflections of Culture. Unpublished manuscript, School of Library and Information Science, Indiana University Bloomington.

Robbins, S.S. and Stylianou, A.C., (2002): Global corporate web sites: an empirical investigation of content and design. Information and Management, 40(3): 205-212.

Robbins, S.S. and Stylianou, A.C., (2010): A longitudinal study of cultural differences in Global Corporate Web Sites. Journal of International Business and Cultural Studies, 3:77-96.

Rosenzweig, G., (2011): ActionScript 3.0 Game Programming University, 2nd ed. Que, USA.

Scahdwitz, N. (2008). Design Patterns for Cross-cultural Collaboration. International Journal of Design, 3(3), 37-53.

Shannon, P., (2000): Including language in your global strategy for B2B e-commerce. World Trade, 13(9): 66-68.

Singh, N. and Pereira, A., (2005): The culturally customized Web site: Customizing Web sites for the global marketplace. Elsevier Butterworth-Heinemann, Burlington, MA.

Sondergaard, M. (1994): Hofstede's consequences: A study of reviews, citations and replications. Organization Studies 15(3):447.

Tixier, M., (2005): Globalization and Localization of Contents: Evolution of Major Internet Sites Across Sectors of Industry, Thunderbird International Business Review, 47(1):15-48.

Turk, A., and Trees, K., (1998): Culture and Participation in the Development of CMC: Indigenous Cultural Information System Case Study. 263-267. In Ess, C. and Sudweeks, F. (eds.), Proceedings Cultural Attitudes Towards Communication and Technology. University of Sydney, Australia.

Williams, M., (2002): Reach In - Reach Out Project, Indigenous Education and Training Alliance, ACEC 2002, In proceedings of the Australian Computers in Education Conference, Sandy Bay, Tasmania.

Williamson, D., (2002): Forward from a critique of Hofstede's model of national culture. Human Relations, 55(11) 1373-1395.

Wollotuka, (2008), The Wollotuka Institute, The University of Newcastle. http://www.newcastle.edu.au/institute/wollotuka/. Accessed 8 Jun 2011.

Woodruff, M. and Waldorf, J., (1995): Multi-User Dungeons Enter A New Dimension: Applying Recreational Practices For Educational Goals. The Electronic Journal of Communication, 5 (4).

Würtz, E., (2005): A cross-cultural analysis of websites from high-context cultures and low-context cultures. Journal of Computer-Mediated Communication, 11(1).

Yuan, X., Liu, H., Xu, S., and Wang, Y., (2005): The impact of different cultures on e-business Web design–Comparison research of Chinese and Americans. In Proceedings of the 11th International Conference on Human-Computer Interaction. Las Vegas.


58

Graphics Group, Department of Computer Science University of Auckland

PO Box 2100, Adelaide 5001, South Australia [email protected], [email protected], [email protected]

[email protected]

The interaction with 3D scenes is an essential requirement of computer applications ranging from engineering and entertainment to architecture and social networks. Traditionally 3D scenes are rendered by projecting them onto a 2-dimensional surface such as a monitor or projector screen. This process results in the loss of several depth cues important for immersion into the scene. An improved 3D perception can be achieved by using immersive Virtual Reality equipment or modern 3D display devices. However, most of these devices are expensive and many 3D applications, such as modelling and animation tools, do not produce the output necessary for these devices. In this paper we explore the use of cheap consumer-level hardware to simulate 3D displays. We present technologies for adding stereoscopic 3D and motion parallax to 3D applications, without having to modify the source code. The developed algorithms work with any program that uses the OpenGL fixed-function pipeline. We have successfully applied the technique to the popular 3D modelling tool Blender. Our user tests show that stereoscopic 3D improves user’s perception of depth in a virtual 3D environment more than head coupled perspective. However, the latter is perceived as more comfortable. A combination of both techniques achieves the best 3D perception, and has a similar comfort rating as stereoscopic 3D. Keywords: stereoscopic 3D, anaglyphic stereo, 3D display, head tracking, head coupled perspective

In conventional applications, 3D scenes are rendered through a series of matrix transformations, which place objects in a virtual scene and project them towards a view plane. The resulting 2D images look flat and unrealistic because several depth cues are lost during the projection. Two important examples of cues are binocular parallax and motion parallax. These two depth cues are equally relevant when perceiving depth in a 3D environment

Copyright © 2012, Australian Computer Society, Inc. This paper appeared at the 13th Australasian User Interface Conference (AUIC 2012), Melbourne, Australia, January 2012. Conferences in Research and Practice in Information Technology (CRPIT), Vol. 126, Haifeng Shen and Ross Smith, Ed. Reproduction for academic, not-for profit purposes permitted provided this text is included.

(Rogers & Graham 1979). Hence it is desirable to re-create them to enhance the realism and presence in 3D scenes.

Binocular parallax is the difference of images seen by each eye when viewing a scene, creating a sense of depth. In electronic media, this can be re-created using stereoscopic 3D (S3D) techniques, where different images are presented to each eye through a filtering mechanism. This is usually accomplished via 3D glasses worn by the user, e.g. when viewing 3D movies in the cinema. In computer applications there are several widely available implementations of stereoscopy, namely NVidia’s 3D Vision Kit (NVidia 2011) and customised graphic drivers, such as iZ3D (iZ3D Software 2011) and TriDef (Dynamic Digital Depth 2011).

Motion parallax is the difference in the positions of objects as the viewer moves through the scene. When the viewer moves in a straight line, objects further away move less than those closer by. This effect can be re-created using a technique known as head coupled perspective (HCP). However, there is currently no widely available solution for implementing this enhancement in the consumer market.

Another motivation for implementing HCP is the relative costs of hardware. Typical implementations of S3D require specialised glasses and monitors (Sexton & Surman 1999). For example, the NVidia 3D Vision Kit and the required specialised monitor capable of 120Hz refresh rate costs over $500 (NVidia 2011). With HCP, only a head tracker is required, which can be implemented with a $30 web camera. This is not only more affordable for general users, but web-cams are already widely used for other applications, such as Skype and social media, and are increasingly integrated into display devices. Hence the majority of users would not have to spend any additional money for such a set-up.

This paper presents a 3D display solution using anaglyphic stereo and head coupled perspective using cheap consumer level equipment. We investigate the benefit of HCP and S3D for depth perception, confirming some of the previous results in this area, as well as coming up with new results. In addition methods of integrating HCP with existing rendering engines are presented, which will make this technology available to a wide range of users.

Section 2 reviews previous work investigating the use of S3D and HCP. Section 3 summarises virtual reality and head tracking technologies relevant to the design of our solution. Section 4 and 5 describe how we achieve stereoscopy and HCP for general OpenGL applications.


59

Section 6 evaluates our solution in terms of improving depth perception and comfort. We draw conclusions in section 7 and give an outlook on future work in section 8.

The term fish-tank virtual reality (Ware et al. 1993) has been used to describe systems which render stereoscopic images to a monitor with motion parallax using head coupled perspective. The original implementation relied on an armature connected to the user’s head which is impractical for widespread adoption. This approach to VR was proposed as an alternative to head mounted displays (HMDs) as it offers several benefits including significantly better picture quality and less of a burden on the user. The authors’ user testing found that pure HCP was the most preferred rendering technique, although users performed tasks better when both stereo rendering and HCP were used.

Techniques that use cameras for head tracking to prevent the need for the user to wear head-gear have been also developed (Rekimoto 1995). The use of the vision based tracking system over the physical one for head tracking did not deteriorate the performance of the system, despite the fact that the viewer’s distance from the screen was not calculated. Face tracking was performed by subtracting a previously obtained background image and then matching templates to obtain the location of the viewer’s face in real-time.

The above techniques rely on generating the appropriate images frame-by-frame depending on the position of the viewer. This makes them inappropriate for presenting live-action media, which must be recorded beforehand. Suenaga et al. (2008) propose a technique that captures images for a range of perspectives and selectively displays the one most appropriate for the viewer’s position. This is however infeasible for video as hundreds of images must be captured and stored for each frame in order to support a large number of viewing orientations.

Several methods have been developed to enhance the quality of fish-tank VR. An update frequency of at least 40Hz and a ratio of camera-to-head movement of 0.75 have been found to provide the most realistic effect (Runde 2000). A virtual cadre can also been employed to improve the depth perception of objects close to or in front of the display plane, while amplifying head rotations can allow a viewer to see more of a scene which improves immersion (Mulder & van Liere 2000).

Yim et al. (2008) found that head tracking was intuitive when implemented into a bullet dodging game. Users experienced higher levels of enjoyment during game-play. The implementation used the Nintendo Wii-remote setup described by Lee (2008). One downside of the setup is the sensor bar attached to the head, which was cumbersome and received some negative feedback. This highlights the importance of unobtrusive enhancement implementations.

Sko and Gardner (2009) used the fish-tank VR principal to augment Valve’s Source game engine. Head coupled perspective along with amplified head rotations were integrated as passive effects, while head movement was also used to perform various in-game tasks such as peering, iron-sighting and spinning. Stereo rendering was

however not performed. User tests found that the amplified head rotations “added life to the game and made it more realistic”, while the concept of HCP was liked by the users, limitations regarding the latency and accuracy of the head tracking degraded the experience.

These findings suggest that head coupled perspective is an important part of recreating a realistic scene in virtual reality, and that it can improve spatial reasoning and help users perform tasks quicker and more efficiently (Ware et al. 1993). Therefore creating a method that can reliably upgrade 3D computer graphics pipelines to render fish-tank VR could have significant positive impacts on visualizing data, computer modelling and gaming without the need for expensive dedicated VR equipment.

Virtual Reality (VR) is a broad term that can be used to identify technologies that improve the user’s sense of virtual presence, or immersion in a virtual scene. Complete immersion involves manipulating all the user’s senses. Our research focuses on improving visual immersion. Hence technologies such as interaction and haptic and aural feedback will not be investigated.

Current VR display technologies are divided into three main categories: fully immersive, semi-immersive and non-immersive. Fully immersive systems, such as head mounted displays (HMD) and CAVE, are known to improve immersion into 3D environment (Qi et al. 2006). However, they are implemented at high costs and with cumbersome setups. With HMD it is impossible to quickly switch between virtual reality and real life as the user is required to wear some kind of head gear (Rekimoto 1995). These disadvantages prevent widespread adoption of such systems in everyday life.

Non-immersive VR presents the opportunity for adoption in everyday situations because of their unobtrusive design and availability of inexpensive implementations. HCP and S3D techniques are classed as non-immersive techniques because the user views the virtual environment through a small window, usually a desktop monitor (Demiralp et al. 2006). Table 1 summarises some common VR display technologies.

Desktop Non Low High Low Fish-tank Non Medium High Low Projection Semi Medium Medium Medium

HMD Full High Low Medium Surround

(e.g. CAVE)

Full High Medium High

Table 1: Comparison of VR display technologies (Nichols & Patel 2002).

Most VR displays are 2D surfaces. In order to

accurately represent a 3D scene, depth cues lost in the 3D to 2D conversion process must be recreated. Several of these cues can be represented in 2D images without


60

special devices. Examples of depth cues emulated by most modern graphics engines are distance fog, depth-of-field and lighting and shading.

Motion and binocular parallax cannot be recreated passively on standard display systems. However through the use of head coupled perspective and stereoscopy these cues can be artificially created.

Stereoscopy refers to the process of presenting individual images to each of the viewer’s eyes. When rendering scenes with slightly different perspectives this process simulates binocular vision. The differences in the perceived images are used by the brain to determine the depth of objects, a process known as stereopsis. The most commonly available methods of displaying stereoscopy are anaglyphs, polarised displays, time multiplexed displays and autostereoscopic displays.

Anaglyphs encode the images for each eye in the red, green and blue channels of the image. The user needs to wear glasses that selectively filter certain channels. There are several combinations of channels in use with the most popular being red/cyan.

Polarised displays work by polarising the individual images in different directions, while the user wears glasses with polarised lenses which block the images with the opposite polarisation. Time multiplexed displays work by displaying the different images alternatively while the glasses alternate which eye receives the image. Autostereoscopic displays work by directing the images from the screen to each eye using a surface covered in tiny lenses or parallax barriers. Table 2 illustrates some of the main differences between the technologies.

Anaglyph High Poor Low Low Polarized Half Good High Low

Shutter High Good Medium Medium

Autostereo Half Good High N/A

Table 2: Comparison of stereoscopic display technologies (Fauster 2007).

Implementing stereo rendering is difficult to add

externally to the rendering pipeline as it requires draw calls to be duplicated and selectively modified. For this reason it was decided to use an existing program to add stereoscopic rendering. The two programs that were tested are the iZ3D driver (iZ3D Software 2011) and NVidia’s 3D Vision driver (NVidia 2011). While this will not allow fine-tuned control over the stereo rendering, it ensures compatibility with the wide range of 3D displays available. Care must be taken to ensure the external stereo functionality does not interfere with the head coupling technique. This will be accomplished by ensuring that the algorithms implementing this functionality have different entry points to the rendering pipeline (Gateau 2009).

Head coupled perspective (HCP) is a technique used to emulate the effect of motion parallax on a 2D display. HCP is implemented for 3D rendering applications by projecting virtual objects’ vertices through the screen plane to the location of the viewer. The point on the screen plane that intersects the line between the object and the viewer is where the object is drawn on the display. This projection is typically performed through a series of matrix multiplications with the object’s vertices. Normally the view point is a virtual camera inside the scene that corresponds to a static point in front of the display. This however does not take into account the motion of the user’s head, and so the projection becomes incorrect when the user’s actual viewing position is different to the assumed position. Figure 1 shows how motion parallax causes an object to appear in a different location on the screen when viewed from a different position. HCP works by coupling the position of the user’s head to the virtual camera such that the users head movements in front of the display cause proportional movements of the virtual camera in the scene. The ratio of head-to-camera movement is referred to as the gain of motion parallax and the value that gives the most realistic effect varies from person to person (Runde 2000).

Figure 1: Diagram illustrating how the correct projection of a virtual object to a surface changes depending on the viewing position.

An adequate head tracker is needed for an effective implementation of HCP. We therefore evaluated the temporal error, spatial error and latency of head trackers.

Visual head trackers are most suitable for our research because of their low hardware costs and unobtrusive nature. A NZ$ 45 Logitech C500 web camera was used with computer vision techniques, which extract the position of the user’s head. The web camera operates at VGA resolution of 640 by 480 pixels with 30 fps. Implementing face detection and tracking from scratch is very complicated if accurate and reliable tracking is desired. Therefore, we evaluated the tracking performance with and without anaglyph glasses of freely available APIs, which can be integrated directly with as little modification as possible.


61

The FaceAPI library (Seeing Machines 2010) was first evaluated due to the fast response and excellent accuracy seen in Sko’s demonstration videos (Sko 2008). When tested without stereoscopic glasses, the FaceAPI was able to track up to 1.5m in range. It could also handle very fast head movements and the latency was unnoticeable.

However, it encountered some difficulty when tracking users wearing anaglyph glasses. In some rare instances, the user’s face could not be detected at all. For most of the time, the position of the eyes was shifted to the lower edge of the glasses. This is shown in Figure 2, where the yellow outline represents the predicted positions of facial features. As the tracking with anaglyph glasses is inadequate, the ARToolkit library was investigated.

Figure 2: Inaccurate tracking with FaceAPI when anaglyph glasses are worn. The predicted face positions are indicated by the yellow outline.

Fiducial marker tracking was found to be the most suitable alternative because a paper marker is sufficient to track 6 degrees of freedom. The marker can be attached to the anaglyph glasses without affecting the user. ARToolkit is a open source library designed to track fiducial markers, such as ones shown in Figure 3 (ARToolworks 2011).

Figure 3: Example fiducial markers used with ARToolkit.

The library was able to detect fast motions and had

little latency. The main drawback was the restricted distance range and marker size. Relatively large markers are required for a good tracking range. With the limited space on the anaglyph glasses, the maximum marker size without additional installations was 3.5 cm by 3.5 cm. We found that a 3.5 cm marker allowed reliable tracking up to a distance of 60 cm. If the user moved further away the tracking became jittery and unstable.

The cause of the limitation was identified as the template matching stage of the tracking algorithm (ARToolworks 2011b). At long ranges, the inside of the marker became too small to be successfully matched.

An attempt was made to develop a simple marker tracker with OpenCV (Willow Garage 2011). The tracker would not need rigorous template matching because exact marker identification is not necessary for head tracking. Hence, detection of smaller markers is possible and the range can be extended to fit the requirements.

The pre-processing stage performs image conditioning and filtering operations. A bilateral filter was applied to smooth the image while retaining the sharpness of the edges. Histogram equalisation was used to increase the contrast of the image.

Contours were extracted from an edge image created using the Canny edge detector method. The thresholds for the detector were changed dynamically to keep the number of detected contours within reasonable range. This was necessary to keep the processing time for one frame roughly constant.

Squares were detected and stored with polygon approximation technique. They are then normalised and template matched with an empty pattern inside to determine the likelihood of it being a valid marker. Two of the most likely markers are used for pose estimation.

Some optimisation was performed to reduce the computation time required. The region of interest of the image was limited once the marker has been detected. This is done assuming the marker does not disappear instantly.

This algorithm was able to reliably detect stationary markers up to a distance of 1.2 m. However, motion blur and processing time were the two major problems which caused faulty detections for moving markers.

The amount of motion blur from the webcam caused the contours to break whenever the marker moved. Even with the shortest exposure setting, the black edges of the markers were smeared by the white regions surrounding it. Different markers were tested but without any success.

Performance was another issue which prevented further development of the marker tracker. Even with a limited region of interest the total processing time for one frame was 60 ms (see Table 3), which did not allow smooth tracking with 30 frames per second.

We therefore found that the free version of the FaceAPI is the most suitable software for implementing HCP.

Grab Image 1 ms Pre-Processing 43 ms

Contour extraction 9 ms Marker and pose extraction 3 ms

Total Execution 60 ms

Table 3: Execution times for each stage of the OpenCV marker tracking algorithm.


62

Stereoscopic 3D was implemented with the anaglyph technique. This relies on colour channels to selectively filter the image presented to each eye. The advantage of anaglyph 3D is the cheap cost of hardware – no special monitor is required and the coloured glasses costs approximately NZ$1 per pair. The OpenGL library has natural support for rendering anaglyph 3D images with the glColorMask() functions. The scene is rendered twice, once from the correct perspective of each eye, to replicate binocular parallax. The perspective corrections are performed identically to HCP described in section 5. The difference is that the scene is rendered once from each eye on different colour channels and blended together. Different colour combinations were tested to determine the pair which gives the least amount of ghosting on the screen. The ghosting occurs depending on the saturation and hue of colour output with the monitor. Since this is a hardware limitation, it cannot be fixed by making adjustments on the screen or program. The colour pairs tested were: red-cyan, red-blue, and red-green. The red-blue gave the least amount of ghosting but caused a shimmering effect because of the high contrast between the two eyes. Red-cyan had the best colour but also the most ghosting. Red-green was chosen as it had only minor ghosting with minor shimmering. Since the scene is rendered twice in every frame, care has to be taken that the scene is not too complex and an acceptable frame rate is achieved. The time delay between the head movement and image update has a significant effect on the user’s depth perception when the delay is 265 ms or greater (Yuan et al. 2000). Most graphics applications are designed to have a frame rate of at least 30 frames per second. Hence this problem is unlikely to occur in practice.

In order to make HCP available to a wide range of users it must be integrated into existing applications. Figure 4 shows the general layers of a 3D computer graphics application. Modifying the source code of an application or rendering engine or developing plug-ins is not an option, since this solution is not general enough, adds a high level of complexity, and requires suitable access mechanisms. Since there are only two graphics libraries commonly used on desktop computers, OpenGL and Direct3D, it was decided to perform the integration at the graphics library level.

Figure 4: Hierarchy of program libraries for a normal 3D application.

Since the integration is done at the library level where source code is not available a technique known as hooking was employed. This term refers to techniques that intercept function calls made by another program. There are two different ways of hooking, either statically by physically modifying the program’s executable file before it executes or by dynamically modifying it at runtime. The second approach was chosen as graphics libraries are frequently updated. This would be problematic for static hooking as the library would need to be modified every time an update occurs.

With dynamic hooking the hooking program consists of three sections: the injection, interception and application specific code. The injection part of the program is responsible for getting the hooking program to run in the target’s address space. The interception code reroutes function calls within the program to the application specific code. The last section is the code specific to the application, in this case the head coupling algorithm. For the injection section CBT-style hooking is used (Microsoft 2011). This type of hooking uses native Windows functions to inject a dynamic-link library (DLL) into the address space of processes which receive window events. When the DLL is injected the operating system invokes the DLLMain function which starts the function interception.

Because the target OpenGL library is a DLL the functions are intercepted by modifying the import descriptor table (IDT) using the APIHijack library (Brainerd 2000). The IDT maps the names of the functions exported by the DLL to the address of their code. Whenever a program tries to call a function from the DLL it will first find the address of the function by looking it up in the IDT. By changing the addresses in the IDT to point to the modified functions, calls to the original function can be efficiently redirected with almost no overhead.

An alternative method for function hooking exists and is called trampolining. This technique is more flexible than modifying the IDT as it works for functions not in a DLL, however there is more overhead as several redirections are needed (Hunt & Brubacher 1999).

Section 3.3 explained that the implementation of head

coupled perspective relies on modifying the perspective transformation matrix. With OpenGL there are two different rendering pipelines used, a fixed-function pipeline and a programmable pipeline. Each of these approaches uses a different method to load transform matrices: in the fixed-function pipeline functions load the matrices individually, while with the programmable pipeline the matrices are combined by the program and passed to OpenGL as a single transformation matrix. Because of this modifying the projection transformation in the programmable pipeline is very difficult. For this reason only the fixed function pipeline was modified to support head coupled perspective. The functions used to load the perspective projection matrix in the fixed-function pipeline are the


63

glLoadMatrix functions and the helper functions glFrustum and gluPerspective. With the glLoadMatrix functions different types of matrices can be loaded, not only projection ones. To ensure that the head coupling algorithm is only performed on projection transformations, the matrices loaded via glLoadMatrix are checked to see if they match the template shown in figure 5.

000

100

002cot

0

0002cot

fnnf

fnf

r

yr

y

Figure 5: Generic perspective projection matrix shown in row-major format where y is the vertical field of view, r is the aspect ratio, n is the distance of the near clip plane and f is the distance of the far clip plane.

Projection matrices are also used for other applications such as shadow mapping. In this case the projection matrix must not be modified. In order to check the current use of the projection matrix we assume that the main camera projection is the only one that uses a non-square texture buffer. This is based-on the assumption that the application runs in full-screen mode, which usually results in an aspect ratio of 4:3 to 16:9. Thus any projection matrix with an aspect ratio of 1 bypasses the head coupling algorithm.

Conventional perspective transforms use a virtual camera position and camera field-of-view (FOV) and aspect-ratio to determine how the scene is projected. With head coupled perspective the projection is determined by the head position and the position and size of a virtual window. While the head position is determined automatically, some method is needed to convert the virtual camera specified by the application to a virtual window. As the virtual camera corresponds to an assumed head position a simple mapping is done where the virtual monitor is mapped at the same distance and size from the virtual camera as the real monitor is from the normal viewing position. Figure 6 illustrates this relationship.

This approach however has some disadvantages, one being that this does not always produce good results as the scene can be at an arbitrary scale. For this reason the mapping parameters can be changed at runtime by the user to make the mapping more realistic. Another disadvantage is that zooming does not work in the application as the virtual camera’s FOV is ignored. Also applications tend to have a large FOV so the user can see a large portion of the virtual world, but this process significantly reduces the effective FOV giving the illusion of tunnel vision. These are inherent disadvantages

with using a correct perspective projection. One potential way to get rid of them would be to use a hybrid approach that uses an approximation of head coupled perspective with conventional virtual camera projection.

Figure 6: Diagram illustrating the initial mapping between the virtual camera and window compared to the physical viewer and monitor

The above described mapping process is performed whenever the loading of a valid projection matrix is detected. Using the calculated virtual monitor we create a new projection matrix, which is loaded instead of the original one.

A user study was performed to determine the effectiveness of the implemented HCP and S3D enhancements. Previous work using a customized set-up reported significant improvements in speed and accuracy when performing a tree tracing task with enhancement (Arthur et al. 1993). In that work a head tracking armature was used, while shutter glasses (with a significant amount of cross-talk) were used for S3D. This is significantly different from the vision-based head tracker and anaglyph 3D used in our evaluation.

There has been no recent study comparing the effectiveness of HCP and S3D enhancements directly. Hence, it is worthwhile to investigate whether the enhancements have different effects on users with our newer, cheaper, and less obtrusive technology.

An OpenGL test application was written for testing and recording depth perception in a virtual 3D environment. The scene was adapted from Marks (2011), who tested HCP for use in a virtual surgery simulation system. The test scene consists of 4 square plates inside a box as illustrated in Figure 7. The plates were about 9 units wide and 1 unit thick. Participants had to determine the plate closest to them using the available depth cues. This was


64

repeated 50 times using 4 different set-ups: no enhancement, HCP, S3P, and HCP & S3P. For each set-up the difficulty was progressively increased by linearly decreasing the maximum difference of depth between the plates from 10 units to 3.3 units.

Figure 7: Screenshot of the user study application with the "no enhancement" set-up.

Selection of the closest plate was made by using keys '1', '2', '4' and '5' on the numerical keypad, which corresponded to the layout of the plates on screen. Users were allowed to provide a "don't know" answer by pressing the '0' key.

The application recorded the participant’s choice and the ordering of the tiles to determine the accuracy of the user’s response. The reaction time was determined by recording the time elapsed between the display and selection of plates. In addition the length of time that head tracking was lost during each test was recorded in order to prevent distortion of the results.

Different depth cues were available in each set-up as shown in Table 4. Note that in order to isolate the measurement of the effect of binocular and motion parallax, most depth cues normally present in a 3D scene had been intentionally removed. The scene shown in Figure 7 uses size as the only depth cue for the “no enhancement” set-up.

No Enhancement Size HCP Size & motion parallax S3D Size & binocular parallax

HCP & S3D Size, motion parallax & binocular parallax

Table 4: Depth cues available in each set-up of the user test.

A set of pilot tests were performed with 5 participants and several problems were found with the initial test scene. Shading of the plates affected the subjects’ depth perception. For some configurations the chosen lighting options resulted in the lower edges of the plates and the background having very similar in colour, which made it difficult to judge size. All of these problems were fixed before beginning the user study.

The user study was performed with a Dell 2009W monitor and Logitech C500 webcam in a shared computer laboratory. Users were required to sit down while using the application.

Before beginning the test, each participant was given a briefing of the experiment. A pre-test questionnaire was completed to determine the amount of prior experience with the HCP and S3D enhancements.

Each participant had to do use the application with the four different set-ups in random orders. A training scene at the beginning of each phase enabled users to become familiar with the controls and enhancement. During training users were given feedback on their selections (i.e. whether their choice was correct). The recording phase began when the participants felt competent at completing the task at a relatively fast speed.

After completing a task participants had to answer a questionnaire. For each task the amount of discomfort, realism of the technique and perceived ease and accuracy of performing the task were assessed with 5-point Likert scale. Open ended questions assessed the depth cues and users were allowed to give general comments regarding the test. After completing the tests for all four set-ups, the comfort, preference, and perceived effectiveness and ease-of-use were ranked for each enhancement.

The user study had 13 participants aged 18 to 24 years old. All of them were university students. The majority of participants had previous experience with conventional 3D applications, most commonly with Blender, CAD tools and/or computer games. 10 users had experienced S3D at least once from either 3D movies or comic books. None of the subjects had prior experience with head coupled perspective systems.

No Enhancement HCP 3D 3D & HCP

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Frac

tion

of S

elec

tions

Cor

rect

Figure 8: Bar plot of the percentage of times where the closest plate was correctly identified. The overlayed interval represents the 95% confidence interval.

The results of using the four set-ups for testing the

accuracy of depth perception are shown in Figure 8. All


65

enhancements provided improvement in the accuracy of depth perception. Combining HCP and S3D resulted in the highest number of correct answers (98.8%). For S3D, HCP and no enhancement the number of correct answers was 93.3%, 78.4%, and 62.9%, respectively. In the combined enhancement test, subjects reported that they found it easy to use S3D to determine depth when the difference between plates is large, while HCP was most useful when the difficulty increased.

When compared to Arthur et al. (1993), HCP and S3D still provided a general improvement of depth perception accuracy. In our case the S3D result is significantly better than for HCP, which is the opposite of the findings reported by Arthur et al. We hypothesise that the following factors could have led to this difference: Our vision based tracker has a higher latency and less

sensitivity than the armature tracker used by Arthur et al.

The effectiveness of S3D and HCP depends on the chosen application.

Our anaglyph S3D has less crosstalk, which makes it more beneficial than the old shutter technology.

Unfortunately we did not have access to the equipment used by Arthur et al. and to their software. This prevented us from performing more research into the reasons for the disparity between the results. An important conclusion we can draw, however, is that the benefits of S3D and HCP depend on the chosen implementation and use case.

No Enhancement 3D HCP 3D & HCP

050

0010

000

1500

0

Test

Reac

tion

Tim

e (m

s)

Figure 9: Box plot of the task completion times for each set-up.

The task completion time for each set-up is shown in

Figure 9. The median task completion time when using HCP is approximately twice of the time when using no enhancement. This can be attributed to the physical movement required for HCP. Conversely, S3D was fastest because no movement was required. When using both HCP and S3D the majority of participants used S3D at the beginning, and then switched to HCP when the task got more difficult, i.e. depth differences decreased. Hence the recorded time is only slightly longer than for S3D.

Table 5 shows a pairwise comparison of how comfortable participants perceived the different set-ups. Using no enhancements received the highest comfort

rating, whereas S3D received the lowest comfort rating. This can be attributed to the discomfort of wearing a physical device. More importantly, most users complained about colour fatigue after performing the S3D tests. Interestingly HCP was perceived as more comfortable than no enhancement. One reason might be that users only had to use head motions when displayed configurations were ambiguous, whereas in simple cases size was sufficient to give the correct answer.

1 (None) 42% 67% 67%2 (HCP) 50% 42% 50%3 (S3D) 25% 42% 25%

4 (HCP & S3D) 25% 42% 50%

Table 5: Pairwise comparison of comfort ratings. The values indicate the proportion of users who found the row enhancement was more comfortable than the column enhancement.

Table 6 shows a pairwise comparison of participants’ preference for completing the given task using different set-ups for depth perception. Very few users preferred the no enhancement option. HCP and S3D rated about equally well, and HCP & S3D combined received the highest ratings and was preferred over all other options by the majority of participants.

1 (None) 8.3% 8.3% 16.7%2 (HCP) 83.3% 41.7% 25% 3 (S3D) 83.3% 50% 8.3%

4 (HCP & S3D) 75% 66.7% 66.7%

Table 6: Pairwise comparison of preference of enhance-ments. The values indicate the proportion of users who found the row enhancement to be more preferable than the column enhancement.

In summary S3D provides the best depth perception. Both accuracy and task completion time was better than for HCP. In terms of user comfort HCP is favoured over S3D. Colour fatigue is a major drawback of anaglyph S3D and usually occurred after only around 10 minutes. Most real world 3D applications require considerable longer interaction times. Overall the combination of HCP and S3P was preferred, mostly because of its superior depth perception. Although HCP had a lower performance than S3D, it is still able to offer a considerably improved depth perception with no negative effect on user comfort. HCP is hence the most viable solution for applications requiring improved depth perception during protracted tasks.

We added HCP to the popular 3D modelling and animation tool “Blender”. An example of the thus


66

achieved effects is illustrated in Figure 10. The addition of HCP dramatically improves depth perception and perceived realism. Several limitations exist and we made the following observations: The modifications described in section 5 currently

only affect the display routine. Interaction with objects, such as selecting vertices, does not work correctly when the head position changes.

Blender only updates the view when the displayed scene changes. Head movements are not detected by Blender itself and hence redisplay must be initiated manually.

HCP will be rendered in any perspective view (but not orthographic view). Hence the traditional 4-view layout works as expected with the addition of HCP for the perspective view.

If a display window is not full-screen and not centred, then the view projection is incorrect since we assume that the user is seated in front of the centre of the display window. This is, however, barely noticeable when using only one display window.

When using more than one active perspective view they are all rendered with the same head offset. Ideally we would like to adjust the head offset depending on the user’s position relative to the display window’s position on the screen.

Figure 10: The effects achieved by integrating HCP into the modelling and animation tool “Blender”.

Head coupled perspective is a viable alternative to stereoscopy for enhancing the realism of 3D computer applications. Both head coupled perspective and stereoscopy improve a user’s perception of depth in a static environment. Our testing showed that head coupled perspective is slightly less effective than stereoscopy. However, we believe that HCP can become more popular in future due to its simple implementation and high comfort rating, especially for time-consuming tasks. A key requirement will be the development of technologies for adding HCP to existing applications and media, without necessitating modifications.

Integration with the OpenGL library has been accomplished using hooking. We demonstrated the concept for the popular modelling tool Blender. The application worked well for exploration tasks in full-screen mode. However, problems exist when using smaller windows and when interacting with the scene, such as selecting objects. In addition the possible field-of-view is constrained. These shortcomings need to be overcome before the technique can be used in a wider range of applications.

Future work will improve the integration of our technology into the programmable rendering pipelines of both Direct3D and OpenGL. This would allow for head coupled perspective to be used in a much larger range of applications. To do this more sophisticated ways of isolating the projection matrix would need to be developed as the transformation matrices are typically pre-multiplied inside the application. The current integration method also breaks mouse input, as the mouse picking no longer uses the same projection as what is used to render the scene. Further research is needed to determine if a solution to this is possible with the current integration approach.

We also want to develop better algorithms for mapping from the application’s virtual camera to a virtual window. This would greatly improve the usability of applications that require a large field-of-view, such as first-person shooters, and also require less calibration by the end-user to get a realistic effect.

Further testing needs to be done to determine how the performance benefits from stereoscopy and head coupled perspective change depending on the type and difficulty of the task being evaluated. It would also be interesting to determine how user preferences change when taking into account cost and rendering performance penalties. In addition testing needs to be performed using static and dynamic environments, and a direct comparison with other technologies is required.

Another major area for future research is adapting the head coupling algorithm so that it works with pre-recorded media such as film and television, not just 3D computer applications. Limited 3D information suitable for this can be extracted automatically from 2D frames using the algorithm by Hoiem et al. (2005).

Arthur, K. W., Booth, K. S., Ware, C. (1993), Evaluating

3D task performance for fish tank virtual worlds. ACM Transactions on Information Systems (3), pp. 239-265.

ARToolworks (2011), ARToolKit Home Page, http:// www.hitl.washington.edu/artoolkit (Last accessed: 2011, August 26).

ARToolworks (2011b), How does ARToolkit work? http://www.hitl.washington.edu/artoolkit/documentation/userarwork.htm (Last accessed: 2011, August 26).

Brainerd, W. (2000), APIHijack - A Library for easy DLL function hooking, http://www.codeproject.com/ KB/DLL/apihijack.aspx (Last accessed: 2011, August 26).

Demiralp, C., Jackson, C. D., Karelitz, D. B., Zhang, S., Laidlaw, D. H. (2006), CAVE and Fishtank Virtual-Reality Displays: A Qualitative and Quantitative Comparison, IEEE Transactions on Visualization and Computer Graphics (3), pp. 323-330.

Dynamic Digital Depth (2011), TriDef - Stereoscopic 3D Software, http://www.tridef.com/home.html (Last accessed: 2011, August 26).

Fauster, L. (2007), Stereoscopic Techniques in Computer Graphics, Project report, Technische Universität Wien,


67

Austria, http://www.cg.tuwien.ac.at/research/publicati ons/2006/Fauster-06-st/Fauster-06-st-.pdf.

Gateau, S. (2009), The In and Out: Making Games Play Right with Stereoscopic 3D Technologies, Game Developers’ Conference, http://developer.download. nvidia.com/presentations/2009/GDC/GDC09-3DVision -The_In_and_Out.pdf.

Hoiem, D., Efros, A., Hebert, M. (2005), Automatic photo pop-up, in ACM Transactions on Graphics (3), pp. 577-584.

Hunt, G., Brubacher, D. (1999), Detours: Binary Interception of Win32 Functions, in Third USENIX Windows NT Symposium, USENIX, http://research. microsoft.com/apps/pubs/default.aspx?id=68568 (Last accessed: 2011, August 26).

iZ3D Software (2011), iZ3D Drivers download page, http://www.iz3d.com/driver (Last accessed: 2011, August 26).

Lee, J. C. (2008), Head Tracking for Desktop VR Displays using the Wii Remote, http://johnnylee.net/ projects/wii/ (Last accessed: 2011, August 26).

Marks, S. (2011), A Virtual Environment for Medical Teamwork Training With Support for Non-Verbal Communication Using Consumer-Level Hardware and Software, PhD Thesis, Dept. of Computer Science, University of Auckland.

Microsoft (2011), MSDN Windows Hooks, http://msdn. microsoft.com/en-us/library/ms632589(VS.85).aspx (Last accessed: 2011, August 26).

Mulder, J. D., van Liere, R. (2000), Enhancing Fish Tank VR, in Proceedings of the IEEE Virtual Reality 2000 Conference (VR '00). IEEE Computer Society, Washington, DC, USA, pp. 91-98.

Nichols, S., Patel, H. (2002), Health and safety implications of virtual reality: a review of empirical evidence, Applied Ergonomics (3), pp. 251-271.

NVidia (2011), NVidia 3D Vision, http://www.nvidia. co.uk/object/GeForce_3D_Vision_Main_uk.html (Last accessed: 2011, August 26).

Qi, W., Taylor, R. M., Healey, C. G., Martens, J.-B. (2006), A comparison of immersive HMD, fish tank VR and fish tank with haptics displays for volume visualization, in Proceedings of the 3rd symposium on Applied perception in graphics and visualization (APGV '06). ACM, New York, NY, USA, pp. 51-58.

Rekimoto, J. (1995), A vision-based head tracker for fish tank virtual reality-VR without head gear, in Proceedings of the Virtual Reality Annual International Symposium (VRAIS '95). IEEE Computer Society, Washington, DC, USA, pp. 94-100.

Rogers, B., Graham, M. (1979), Motion parallax as an independent cue for depth perception, Perception , pp. 125-134.

Runde, D. (2000), How to realize a natural image reproduction using stereoscopic displays with motion parallax, IEEE Transactions on Circuits and Systems for Video Technology (3), pp. 376-386.

Seeing Machines (2010), FaceAPI Homepage, http://www.seeingmachines.com/product/faceapi/ (Last accessed: 2011, August 26).

Sexton, I., Surman, P. (1999), Stereoscopic and autostereoscopic display systems, IEEE Signal Processing Magazine (3), pp. 85-99.

Sko, T. (2008) Using Head Gestures in PC Games, http://www.youtube.com/watch?v=qWkpdtFZoBE (Last accessed: 2011, August 26).

Sko, T., Gardner, H. J. (2009), Head Tracking in First-Person Games: Interaction Using a Web-Camera, in Proceedings of the 12th IFIP TC 13 International Conference on Human-Computer Interaction: Part I (INTERACT '09), Springer-Verlag, Berlin, Heidelberg, pp. 342-355.

Suenaga, T., Tamai, Y., Kurita, Y., Matsumoto, Y., Ogasawara, T. (2008), Poster: Image-Based 3D Display with Motion Parallax using Face Tracking, in Proceedings of the 2008 IEEE Symposium on 3D User Interfaces (3DUI '08). IEEE Computer Society, Washington, DC, USA, 161-162.

Ware, C., Arthur, K., Booth, K. S. (1993), Fish tank virtual reality, in Proceedings of the INTERACT '93 and CHI '93 conference on Human factors in computing systems (CHI '93). ACM, New York, NY, USA, 37-42.

Willow Garage (2011), OpenCV Wiki, http://opencv. willowgarage.com/wiki (Last accessed: 2011, August 26).

Yim, J., Qiu, E., Graham, T. C. N. (2008), Experience in the design and development of a game based on head-tracking input, in Proceedings of the 2008 Conference on Future Play: Research, Play, Share (Future Play '08). ACM, New York, NY, USA, pp. 236-239.

Yuan, H., Sachtler, W. L., Durlach, N., Shinn-Cunningham, B. (2000), Effects of Time Delay on Depth Perception via Head-Motion Parallax in Virtual Environment Systems Presence: Teleoperators and Virtual Environments (6), pp. 638-647.


68

An Evaluation of a Sketch-Based Model-by-Example Approach for CrowdModelling

Li Guan Burkhard C. Wunsche

Graphics Group, Department of Computer ScienceUniversity of Auckland, New Zealand,

Email: [email protected], [email protected]

Abstract

An increasing number of computer applications requirecomplex 3D environments. Examples are entertainment(games and movies), advertisement, social media tech-nologies such as “Second Life”, education, urban plan-ning, landscape design, search and rescue simulations, vi-sual impact studies and military simulations. Many virtualenvironments contain thousands of similar objects suchas characters, trees, and buildings. Placing these objectsby hand is cumbersome, whereas an automatic placementdoes not allow sufficient control over the desired distri-bution characteristics. In previous work we presented aprototype for a sketch-based model-by-example approachto generate large distributions of objects from sketched ex-ample distributions. In this paper we present an improvedalgorithm and we perform a formal user study demonstrat-ing that the approach is indeed intuitive, effective, and thatit works for a large number of regular, irregular and clus-tered distribution patterns. Remaining limitations relatedto Gestalt and semantic concepts are illustrated and dis-cussed.

Keywords: sketch-based modeling, sketch-based inter-face, crowd modeling, texture synthesis, mass-spring sys-tem

1 Introduction

The use of virtual environments (VE) is expanding rapidlyand applications range from entertainment (games andmovies), to education, social media (e.g., “Second Life”),architecture, engineering, and urban design and planning.Creating virtual environments can be a time-consumingprocess, especially when modelling scenes containingthousands of similar objects such as characters, trees, andbuildings. Such aggregations of objects are, however, nec-essary to make computer generated scenes look naturaland visual attractive. Placing objects individually is cum-bersome, whereas using statistical models, such as regularor random patterns, does not give the user sufficient con-trol and often looks artificial.

In previous work we showed that a model-by-exampletechnique using sketch input is a promising approachto rapidly generate crowds (Guan & Wunsche 2011).Sketching provides complete freedom over the input, en-courages creativity (Gross & Do 1996), and facilitatesproblem solving (Wong 1992). Large point distributionsare defined by sketching the outline of the domain of thedistribution (e.g., boundary of a forest) and then sketch-ing a small example distribution, which is replicated overCopyright c!2012, Australian Computer Society, Inc. This paper ap-peared at the 13th Australasian User Interface Conference (AUIC 2012),Melbourne, Australia, January-February 2012. Conferences in Researchand Practice in Information Technology (CRPIT), Vol. 126, HaifengShen and Ross Smith, Ed. Reproduction for academic, not-for profitpurposes permitted provided this text is included.

the domain using a combination of texture synthesis andproprietary techniques.

The technique allows the generation of a large num-ber of different outputs, but suffers from several shortcom-ings, especially regarding the synthesis of clustered distri-butions. Also no user study was performed to confirm thatthe method is indeed effective in practice. In this paperwe present a novel physical-based technique for optimis-ing clustered distributions, while maintaining the charac-teristics of the original input. A user study confirms thatthe technique is intuitive, effective, efficient and fun.

In the following discussion we use the term “crowd”to mean any aggregation of objects, e.g., a forest (trees),herd (animals), village (residential buildings), or city(skyrises).

Section 2 reviews the literature in this field. Section 3presents design requirements and our previously pub-lished algorithm for crowd modelling. Section 4 presentsa new algorithm for synthesising clustered distributionsand section 5 gives implementation details. The algorithmis evaluated in section 6 using experimental results and auser study. We conclude the paper with section 8, whichalso gives an overview of future work.

2 Literature Review

A large number of mathematical methods exists for cre-ating point distributions. Most of them are concernedwith creating random distributions with certain statisticalproperties. For example, Poisson Disk sampling patternsare popular in the graphics community, e.g., for renderingand illumination (Jones 2006), because they have mini-mal low frequencies and no spikes in the power spectrum.Quasi-Monte Carlo methods are popular in problems in-volving integration, such as global illumination (Szirmay-Kalos 2008) and area computation of point-sampled sur-faces (Liu et al. 2006). Near uniform distributions withuser defined characterisatic, such as alignment with a vec-tor field, can be created using diffusion-advection equa-tions (Botchen et al. 2005).

The literature offers much less references regardingpoint distributions for object placement. Many applica-tions define the positions of large groups of objects us-ing application specific physically or statistically moti-vated techniques, similar to the ones explained above.The popular landscape synthesis tool “Terragen” uses en-vironmental parameters and directional controls to mod-ify a fractal noise texture specifying the location of veg-etation (Planetside Software, 2006). Procedural meth-ods have been used for city simulations (Greuter et al.2003). Diffusion-advection equations are useful for time-dependent processes with distance constraints such astraffic patterns (Garcia 2000). Bayesian decision pro-cesses (Metoyer & Hodgins 2004) and the partial dif-ferential equations have been used to describe local andglobal behaviour patterns of crowds (Treuille et al. 2006).Crowd behaviour, based on a given initial position, can besimulated using an agent-based method (Reynolds 1987,


69

Funge et al. 1999, Sung et al. 2005, Massive Software2009).

Professional crowd simulation tools usually offer inter-faces for randomly generating crowds over a user defineddomain by specifying the size and/or density of charac-ters (WorldOfPolygons.com 2006). A spray interface fordistributing grass, trees and other objects over a terrain hasbeen presented by van der Linden (2001).

3 Crowd Modelling Prototype

In previous research we designed a prototype for sketch-based crowd modelling (Guan & Wunsche 2011).

3.1 Requirement AnalysisThe solution was motivated by an analysis of crowds inphotographs and by a user study. The evaluation showedthat most crowds can be characterised by the shape of theirdomain and their distribution pattern. Most patterns couldbe divided into three classes: random, regular and clus-tered. Within each class there is an infinite number ofdifferent distribution patterns, e.g., a regular distributioncan be a rectangular grid, or a more complex repetitive ar-rangement. In either case the pattern can be completelyregular or have different degrees of jitter in it.

Our analysis showed that a feasible way to define alarge variety of crowds and collections of objects is to de-fine its domain and an example distribution. The programmust be able to differentiate between different types ofdistributions, such as regular, irregular and clustered, andmust be able to replicate the characteristics of any suchpattern without merely repeating it.

3.2 DesignOur original solution (Guan & Wunsche 2011) allowsusers to sketch a domain and a sample distribution us-ing dots or short strokes. The sketched example distri-bution is analysed using a k-means++ algorithm (Shindler2008) and Euclidean shortest spanning tree to determinethe number of clusters in the user’s input. If the user inputcontains only one cluster then we determine whether it isregular or stochastic by analysing the distribution of edgesof a Euclidean shortest spanning tree.

For a regular input distribution we choose the small-est enclosing square and use the thus created texture im-age as exemplar for a Wang tiling texture synthesis algo-rithm (Cohen et al. 2003). We found that this algorithmpreserves structures in the input texture well. For irreg-ular distributions we choose the exemplar texture analo-gously, but then apply a Chaos Mosaic algorithm (Guoet al. 2000).

For a clustered input we compute the mean and stan-dard deviation of the size of all clusters and of the dis-tances of the points in them to the respective centers. Wethen generate new clusters based on these probability dis-tributions. The clusters are then randomly placed subjectto a minimum distance criterion.

The synthesis of clusters does not preserve the charac-teristics of the user input. For example, we are unable toreplicate uniformly spaced clusters. If the user input con-tains randomly distributed clusters, then the synthesisedresult might still not look realistic, since the power spec-trum of the synthesised cluster positions can vary dramat-ically from the exemplar.

4 Cluster Synthesis

The above described algorithm suffers from a poor syn-thesis of clustered input. In order to find a solution weobserve that we can replace clusters with their centroids.The resulting point distribution can be used as input for

the original algorithm, and the synthesised point distribu-tion represents the location of all synthesised clusters. Thegeneration of individual clusters is then performed as de-scribed in (Guan & Wunsche 2011). A flow chart of theresulting improved algorithm is shown in figure 1.

Figure 1: Flowchart of the improved algorithm for sketch-based crowd modelling.

Note that in theory an example input can contain clus-ters of clusters, e.g., the soldiers in an army can be ar-ranged in, say, n!m large groups, where each large groupcontains k! l small groups, and each small group has sol-diers standing in a grid like patterns. Such cases could beresolved by recursively applying the above algorithm, butsince this case neither occurred in our user studies nor inour evaluation of image data bases, we did not implementthis generalisation.

The above method does not put any constraints on thedistance between clusters. As a result it is possible thatcluster positions are synthesised such that clusters over-lap and appear as one big cluster. This situation must beavoided, since the resulting distribution does not reflectthe properties of the example input and hence looks unin-tuitive. The situation is illustrated in figure 2.

Figure 2: Left: Example point distribution (black dots)and the corresponding cluster centres (red dots) andbounding circles (red). Right: A hypothetic synthesiseddistribution where two cluster centres are too close result-ing in overlapping clusters, which are perceived by usersas one uncharacteristically large cluster.

4.1 Cluster OptimisationIn order to improve the synthesised clusters we need tofind a solution which assures that cluster centres are atleast a distance dminClusters apart. In the examples in thissection we set dminClusters to the diameter of the largestcluster. This setting is good for illustration purposes, butdoes not result in a clear visual differentiation betweenclusters. We hence recommend to use in practice fordminClusters at least 1.5 times the largest cluster diameter.


70

Figure 3: An irregular distribution of cluster centres (black dots) and the corresponding cluster sizes (red circles) andDelauney triangulation (blue lines). (a) The original configuration. (b) The result of applying a traditional mass-springsystem where the rest length of each spring is the maximum of the original edge length and the cluster diameter. (c) Theresult of applying our modified mass spring system.

Figure 4: A jittered regular distribution of cluster centres (black dots) and the corresponding cluster sizes (red circles) andDelauney triangulation (blue lines). (a) The original configuration. (b) The result of applying a traditional mass-springsystem where the rest length of each spring is the maximum of the original edge length and the cluster diameter. (c) Theresult of applying our modified mass spring system.

A naive approach shifting clusters until they don’toverlap could significantly change the distribution patterngenerated by the underlying texture synthesis method (seefigure 1). We optimise cluster positions using a mass-spring system. Cluster centres represent the mass pointsof the mass-spring system, whereas the springs are givenby the edges of the points’ Delauney triangulation. Exam-ples are given in part (a) of figure 3 and 4.

All springs are given the same spring constantkstandardSpring. If the springs’ rest lengths are equal to thecorresponding edge lengths of the Delauney triangulation,then no forces are generated and the system is in balance.We set a spring’s rest length lrest to:

lrest = max(ledge,dminClusters)

where ledge is the length of the corresponding edge inthe Delauney triangulation and dminClusters is, as explainedabove, the desired minimum distance between clusters.

If the distance between two cluster centres is smallerthan lrest then a force is generated which pushes them apart(see section 5). The result of applying this algorithm to theconfiguration in part (a) of figure 3 and 4 can be seen inpart (b) of those figures.

Two problems can be observed:

• Clusters are still overlapping.

• Some points shift significantly from their original po-sition, which changes the appearance of the patterngenerated in the synthesis step. For example, the pat-tern in figure 4 (b) does not anymore look like a reg-ular grid.

The cause of these problems is that spring forces changelinearly with length changes from the rest length. It isnot possible to specify that some position changes, e.g.,in order to avoid cluster overlap, are more important thanothers.

4.1.1 Non-Linear SpringsIn order to overcome the problem of overlapping clus-ters we use non-linear springs, where the spring forceincreases dramatically if the spring length is less thanthe desired minimum cluster distance dminClusters. This isachieved by computing the current length l of a spring andincreasing its spring constant kstandardSpring by a factor f ,if l < dminClusters.

The physical interpretation of this modification is il-lustrated in figure 5: Given are two points connected by aspring with length lrest and spring constant kstandardSpring.Moving the points apart results in a force pulling themtogether (b), whereas pushing the points closer together,results in a force pulling them apart (c). If the distance be-tween the points is less than dminClusters then the spring isreplaced with one of equal length, but with a much higherspring constant f " kstandardSpring. The result is a spring,which is relatively easy to extend or compress down toa length of dminClusters, but very difficult to compress anyfurther than this.

4.1.2 String-SpringsIn order to prevent large movements away from the origi-nal cluster positions we record the position of cluster cen-


71

Figure 5: Physical interpretation of a non-linear spring:Two points are initially connected by a spring with restlength lrest and spring constant ksmall (a), resulting in mod-erate forces resisting an extension (b) or compression (c)of the spring. If the distance between points is less than dthen we replace the spring with a new one, which has thesame rest length, but a much higher spring constant klarge.The result is a spring, which is relatively easy to extendor compress down to a length of d, but very difficult tocompress any further than that.

troids and connect them with springs to the current posi-tion. We want to achieve that cluster centroids can movefrom their original position by a distance of dmaxO f f set , butany further movement should be very difficult.

We achieve this behaviour using a new concept weterm string-spring. A string-spring can be physicallyimagined as a string of length dmaxO f f set/2 connected toa spring with a rest length of dmaxO f f set/2. An exampleis given in figure 6 (a). As long as the two points con-nected by the string-spring are less than dmaxO f f set apartno force is generated since the string can compensate forany position change of the spring, i.e., the spring is nevercompressed (figure 6 (b)). However, if the two pointsare moved further than dmaxO f f set apart, then any amountabove this threshold results in an extension of the springand a force pulling the two points together (figure 6 (c)).

Figure 6: A string-spring can be physically imagined asa string of length d/2 connected to a spring with a restlength of d/2 (a). If the two points connected by thestring-spring are less than d apart no force is generated,since the string prevents compression of the string (b). Ifthe distance exceeds d, then any amount above this thresh-old results in an extension of the spring and a force pullingthe two points together (c).

The results of applying our improved mass-spring sys-tem to the configuration in part (a) of figure 3 and 4 areshown in part (c) of those figures. It can be seen that incontrast to the application of the original mass-spring sys-tems (part (b) of the figures) the clusters do not overlapand that the overall distribution pattern has a higher re-semblance with the original one. This is best illustratedby figure 4, where both image (a) and (c) look like a regu-lar grid with some jitter, whereas image (b) looks slightlyrandom.

5 Implementation Details

We have implemented the above described algorithms us-ing Microsoft Visual C++ and OpenGL. So far we haveonly integrated the generation of sketched tree objectswith our crowd generation software.

5.1 Mass-Spring SystemThe mass-spring system is implemented as a special caseof a particle system, where the n particles are the clustercentroids. Each particle Pi has a position xi, velocity vi,acceleration ai and applied Force Fi.

The applied force at any time is given by the sum of allspring forces connected to that particle. If two particles Piand Pj with positions xi and x j, respectively, are connectedby a spring with rest length li j and spring constant ki j, thenthe resulting force is given by Hookes Law:

Fi j = #ki j(|xi #x j|# li j)xi #x j

|xi #x j|The second term represents the difference between the

current length and the rest length of the spring, and thethird term is a unit vector expressing the direction of theresulting force. In our mass-spring system the term Fi j isadded to the total force acting on particle Pi, and the term#Fi j is added to the total force acting on particle Pj.

The spring constant is kstandard , but increases to f "kstandard if the distance between two particles falls belowdminClusters. For the string-springs we compute the dis-tance between the original and current particle position. Ifthe distance is higher than dmaxO f f set then we apply equa-tion 5.1 with a spring constant kstringSpring.

The constant f " kstandard must have the highest value,since non-overlapping clusters are most important. Theconstant kstringSpring must be much larger then kstandard inorder to avoid large displacements of cluster centers. Weuse:

kstandard = 1.0f = 80.0

kstringSpring = 30.0

5.2 Numerical SolutionThe above mass-spring system results in a physical simu-lation where particle positions change over time. We areinterested in its steady state solution, i.e., where the sumof all forces acting on particles are zero. In order to get aunique solution two requirements must be fulfilled:

• We need a fixed point of reference. This is achievedby computing the convex hull of all particle positionsand fixing the positions of all points on the boundaryof the convex hull. Points less then dminClusters apartfrom an already fixed point are not fixed (since wewant them to move apart).

• We need to introduce a damping term to prevent thesystem from oscillating. The damping term will re-move kinetic energy and thus enable the system toreach a steady state.

The final mass-spring system is described by the set ofequations

Fi = mi "ai = #kxi # cvi i = 1, . . . ,n

where Fi is the sum of all spring forces acting on particlePi and c is the viscous damping coefficient, which we setto 0.2. In order to solve the system it is converted to a sys-tem of ordinary differential equations (ODEs), which canbe expressed by two vectors: the current state containingthe positions and velocities of all particles, and the state


72

derivative containing the velocities and accelerations of allparticles (Witkin et al. 1994). The initial state is given bythe original particle positions and by setting all velocitiesand accelerations to zero. The final position of all parti-cles can then be easily computed by using an ODE solver,which terminates if the position changes in each time stepare below a given threshold. Note that we use stiff springs(high spring constant) and the Euler method is hence nu-merically unstable. We use a fourth-order Runge-Kuttamethod (Press et al. 1992).

6 Results

6.1 Experimental ResultsWe have evaluated both the individual components of ouralgorithm, and the algorithm as a whole.

6.2 Exemplar ClassificationFigure 7 shows the results of classifying input into regular(a), irregular (b)-(e), and clustered patterns (f)-(i). Over-all the classification works well. The clustering algorithmfails if two clusters’ bounding boxes overlap, e.g., nestedv-shaped point distributions. This is due to the distancemetric used in the k-means++ algorithm.

Figure 7: Examples of user input classified as regular (a),irregular (b)-(e), and clustered pattern (f)-(i).

6.3 ExamplesThe figures 8-10 show examples of regular, irregular andclustered input (red boxes), respectively, the resulting syn-thesised point distributions, and 3D scenes generated withthem.

6.4 LimitationsThe texture classification algorithm is unable to recogniseGestalt concepts. This was already demonstrated in fig-ure 7, where items (d) and (e) were classified as irregu-lar. The reason for this is, that we test for regularity byconstructing an Euclidean shortest spanning tree and thenanalyse the distribution of its angles and edges. For a reg-ular grid, for example, the distances to the neighbouringvertices have all approximately similar lengths. The an-gles with the x-axis are clustered around two values, e.g.,roughly zero degree or roughly 90 degree if the grid is axisaligned. However, for the diamond shape in image (e) thedistribution of edge angles is not bimodal.

Figure 8: Example of a regular input (red box) and thesynthesised point distribution (left) and model of a planta-tion forest generated with it (right).

Figure 9: Example of a irregular input (red box) and thesynthesised point distribution (left) and model of a naturalforest generated with it (right).

Figure 10: Example of a clustered input (red boxes) andthe synthesised point distribution (left) and model of anurban park with clusters of trees generated with it (right).

Figure 11: Example of an input point distribution (red dotsin box) with Gestalt information (star shape). The input isclassified as irregular and as a consequence the Chaos Mo-saic algorithm is applied, which results in an unexpectedoutput.


73

This problem extends to clustered distributions. Forexample, the input in figure 7 (g) is correctly classifiedas clustered. However, the point distribution within eachcluster is recognised as irregular. As a result, newly gener-ated clusters contain a random distribution of points gen-erated with the Chaos Mosaic algorithm. An improve-ment over the current solution would be to recognise in-put distributions with Gestalt information and just repeatthem using Wang tiles or another tile based algorithm.We have surveyed a wide class of texture synthesis al-gorithms (Guan & Wunsche 2011, Manke & Wunsche2010), but we are not aware of any technique to repli-cate semantic information and Gestalt concepts in a nat-ural manner without just repeating the input.

A second problem is that example distributions with asmall number of points are insufficient to synthesise real-istic looking results. As an example consider figure 12.The input consists of six clusters, which is enough for theWang tiling algorithm to generate a realistic regular dis-tribution of cluster centroids. For each cluster centroid anew cluster is synthesized. Since one of the input clustersis regular, the algorithm also produces some regular clus-ters. The number of points in the synthesised clusters de-pends on the variation within the input clusters. Since onlyone input cluster is regular all synthesised regular clustershave the same number of points, i.e., four, and they allhave the same pattern (encircled in yellow).

Figure 12: Example of a clustered input (red boxes) withregularly distributed cluster centroids. The clusters them-selves have irregular distributions with the exception ofone (encircled in yellow). The synthesised point distri-bution (left) suffers from repetitions, which however arebarely noticeable in the resulting 3D model (right).

6.5 User StudyWe evaluated the usability, efficiency and effectiveness ofour algorithm with a user study. Participants had to com-plete three tasks:

• Task 1: Modelling a plantation forest with hundredsof trees (a picture of a real plantation forest demon-strating the near regular arrangement of trees wasshown).

• Task 2: Modelling a natural forest with hundreds oftrees (an aerial picture of a real natural forest demon-strating the random arrangement of trees was shown).

• Task 3: Modelling an urban park with clusters oftrees (an aerial picture of a park with dozens of clus-ters of trees was shown).

We surveyed the participants after each task and at theend of the user study. Answers were recorded on a 7-level Likert scale ranging from -3 (strongly disagree) to3 (strongly agree).

The study had 20 participants, 16 male and 4 female.All of the participants were university students or staffwith six aged 16-20, eleven aged 21-25, two aged 31-35and one between 40-45 years old. The participants were

Average SDTask 1 (Plantation forest) 38.38 19.63Task 2 (Natural forest) 37.79 22.61Task 3 (Park) 61.63 41.64

Table 1: Average completion times and standard deviation(SD) in seconds for the tasks 1-3.

from the following departments: Computer Science (8),Commerce (5), Medicine (2), Arts (2), Education (2) andPharmacy (1). Twelve of the participants had never useda modelling tool, four rarely and four did sometimes useone. From those users who had used 3D modelling toolsthe most commonly used software was Blender, GoogleSketchup, and 3D Studio Max.

6.6 3D Modelling TasksUsers were told that they had to sketch the outline of themodelled scene and an example distribution indicated bydots or short sketches. Users were able to clear and restartthe input if they were unsatisfied with the results. In gen-eral users required several tries to get a feeling how theresulting distribution would look like for a given input.The average completion times for the modelling tasks 1-3are shown in table 1. It can be seen that generating regularand irregular patterns is similarly easy, but that generatingclustered patterns requires at least 50% more time on aver-age. One user in particular struggled and initially sketchedthe shape of each cluster. The user required help from thestudy supervisor and took more than 160 seconds to com-plete the task.

We evaluated users’ experiences with the tool for threemodelling tasks involving the creation of a plantation for-est (regular example input), natural forest (random exam-ple input), and an urban park (clustered input). Table 2summarises the results. It can be seen that all tasks wereunderstood and easy-to-complete. Users strongly agreedthat the tool simplified the modelling task and they weresatisfied with the results. The lowest satisfaction, albeitstill positive, was for modelling a clustered distribution.

A general complaint was the lack of an “eraser” toolto correct mistakes and incrementally modify the inputsketch until the example input generated the desired re-sult. Another problem was the lack of information abouthow the density of points in the input would be reflected inthe resulting 3D scene. Several users initially drew pointstoo close together and had to restart after they saw the re-sulting 3D output.

When sketching a regular input distribution severalusers had problems with the tool initially classifying theinput as random. In a few instances users had to be toldto sketch the input more carefully to make sure that it gotrecognised as regular input. A few users commented thatthe program should give feedback during the interaction.

Users struggled most when modelling clustered distri-butions. Several users represented clusters with circularsketches filled with points. In other cases clusters weretoo close together and were not recognised as individualclusters resulting in an unexpected output. Finally severalusers drew clusters filling most of the domain, such thatno new clusters were synthesised.

6.7 General QuestionsIn addition to the three tasks above we allowed users to ex-periment and model any distribution of their choice. Ex-amples were buildings in a city, a flower bed, fish in theocean, rabbits in the forest, people at a festival, people ina cinema, students in a playground, and the hospital wardshown in figure 13.

We then asked questions regarding the overall usabil-ity, usefulness, and user satisfaction with the application.


74

Task 1 (Plantation forest) Task 2 (Natural forest) Task 3 (Park)Average SD Average SD Average SD

Q1: I understood the task 2.25 0.71 2.45 0.76 2.20 1.01Q2: The task was easy 2.50 0.69 2.50 0.61 1.70 1.66Q3: The tool simplified the modelling task 2.65 0.59 2.70 0.57 2.25 1.45Q4: I am satisfied with the result 2.50 0.61 2.55 0.60 2.20 1.20

Table 2: Average response on a 7-level Likert scale (from -3 to +3) and standard deviations (SD) to the questions on theleft for the tasks on the top.

Figure 13: Example of a result of the free-drawing taskin the user study: distributions of beds in a hospital ward(without partitioning walls).

The results are summarised in table 3. Overall users weresatisfied with the usability of the tool. Participants wereable to successfully complete modelling tasks in less thana minute, and overall users agreed that the tool was in-tuitive. The majority of users agreed that the applicationis easier to use than traditional modelling tools, but largevariations in answers were observed. We believe that thismight have to do with the limited functionality of the tooland the lack of feedback (error messages, help).

Despite that participants agreed that the tool is useful,and a worthwhile addition to traditional modelling tools.Most users would use the application in future, if they hadthe opportunity to do so. Overall users were satisfied withthe application and enjoyed using it. The lowest satisfac-tion was with the interface. Reasons were the lack of feed-back, undo and “eraser” functionalities.

7 Conclusion and Future Work

We have a presented a novel tool for modelling large dis-tributions of objects by sketching the boundary of theoccupied domain and a small sample distribution. Ourwork extends a previously presented prototype and in-cludes new functionalities for synthesising clustered dis-tributions. This is achieved by representing clusters bytheir centroids and using the resulting point pattern as in-put for a synthesis step. In order to prevent clusters fromoverlapping or being located too close together we devel-oped a novel mass-spring systems. We showed that thispostprocessing step was able to correct cluster distances,while still maintaining the overall characteristics of thesynthesised pattern.

We demonstrated that the tool can successfully gener-ate a large number of regular, irregular and clustered distri-butions. Gestalt and semantic properties of input patternscannot be synthesised and hence result in unexpected out-put. If the example input has too few points this can leadto repetitive patterns.

A user study demonstrated that participants were sat-isfied to very satisfied with the application. Regular andirregular distributions could be generated in less than 40seconds without help. Some users had problems with gen-erating clustered output. The most commonly mentioned

problems were related to the interface, i.e., lack of feed-back, no undo functionality, no “eraser”, and a lack of im-mediate feedback of the effect of user input on the result-ing 3D scene.

In future work we want to increase the range of repro-ducible input distribution patterns and in particular incor-porate Gestalt concepts. In addition we want to fully inte-grate this crowd modelling software into our “LifeSketch”software for prototyping virtual environments (Olsen et al.2011, Yang & Wunsche 2010).

References

Botchen, R. P., Weiskopf, D. & Ertl, T. (2005), Texture-based visualization of uncertainty in flow fields, in ‘Pro-ceedings of IEEE Visualization’, pp. 647–654.

Cohen, M. F., Shade, J., Hiller, S. & Deussen, O. (2003),‘Wang tiles for image and texture generation’, ACMTrans. Graph. 22(3), 287–294.

Funge, J., Tu, X. & Terzopoulos, D. (1999), Cognitivemodeling: knowledge, reasoning and planning for in-telligent characters, in ‘Proc. of the 26th annual confer-ence on Computer graphics and interactive techniques(SIGGRAPH ’99)’, ACM Press/Addison-Wesley Pub-lishing Co., New York, NY, USA, pp. 29–38.

Garcia, A. L. (2000), Physics of traffic flow, in ‘NumericalMethods for Physics’, 2nd edn, Prentice Hall, chapter 7.

Greuter, S., Parker, J., Stewart, N. & Leach, G. (2003),Real-time procedural generation of ‘pseudo infinite’cities, in ‘Proceedings of the 1st international confer-ence on Computer graphics and interactive techniquesin Australasia and South East Asia (GRAPHITE ’03)’,ACM, New York, NY, USA, pp. 87–ff.

Gross, M. D. & Do, E. Y.-L. (1996), Ambiguous inten-tions: a paper-like interface for creative design, in ‘Pro-ceedings of the 9th annual ACM symposium on Userinterface software and technology (UIST ’96)’, ACM,New York, NY, USA, pp. 183–192.

Guan, L. & Wunsche, B. C. (2011), Sketch-Based CrowdModelling, in ‘Proceedings of the 12th AustralasianUser Interface Conference (AUIC 2011)’, pp. 67–76. http://www.cs.auckland.ac.nz/˜burkhard/Publications/AUIC2011_GuanWuensche.pdf.

Guo, B., Shum, H., & Xu, Y.-Q. (2000), Chaos mo-saic: Fast and memory efficient texture synthesis,Technical report MSR-TR-2000-32, Microsoft Re-search. http://research.microsoft.com/pubs/69770/tr-2000-32.pdf.

Jones, T. R. (2006), ‘Efficient generation of poisson-disksampling patterns’, Image Rochester NY 11(2), 1–10.

Liu, Y.-S., Yong, J.-H., Zhang, H., Yan, D.-M. & Sun, J.-G. (2006), ‘A quasi-monte carlo method for computingareas of point-sampled surfaces’, Comput. Aided Des.38, 55–68.


75

Average SDQ 1.1: The modelling process is intuitive 2.00 0.86Q 1.2: I quickly learned how to use the tool 2.35 0.81Q 1.3: I easily remembered the tool’s functionalities 2.10 1.17Q 1.4: I quickly learned how to use all functionalities 2.10 1.07Q 1.5: The tool is easy to use 2.35 0.88Q 1.6: The tool is easier to use than traditional modelling tools (e.g., Blender) 1.50 1.51Q 2.1: The tool is useful 2.15 0.75Q 2.2: The tool is more useful for generating large distributions than traditional modelling tools 2.11 0.81Q 2.3: The tool is a useful addition to traditional modelling tools 2.16 0.90Q 2.4: The distributions generated with the tool look realistic 2.10 0.91Q 2.5: The distributions generated with the tool look as expected 2.15 0.99Q 2.6: The distributions generated with the tool look as I wanted 2.30 0.66Q 2.7: The tool made the modelling of large distributions more effective 2.55 0.69Q 2.8: I would use the tool frequency, if I could 1.95 0.94Q 3.1: Overall I am satisfied with the interface of the tool 1.95 1.10Q 3.2: Overall I am satisfied with the functionalities of the tool 2.10 1.12Q 3.3: Overall I am satisfied with the results achieved with the tool 2.20 0.95Q 3.4: The tool was fun to use 2.55 0.60

Table 3: Questions regarding usability (Q1.1 - Q1.6), usefulness (Q2.1 - Q2.8), and user satisfaction (Q3.1 - Q3.4). Thecolumns give the average response on a 7-level Likert scale (from -3 to +3) and their standard deviation (SD).

Manke, F. & Wunsche, B. C. (2010), Fast spatiallycontrollable multi-dimensional exemplar-based texturesynthesis and morphing, in M. P. J. A. H. Ran-chordas, A., ed., ‘Computer Vision and ComputerGraphics’, Vol. 68 of Communications in Computerand Information Science, pp. 21–34. http://www.cs.auckland.ac.nz/˜burkhard/Publications/SpringerCCIS2009MankeWuensche.pdf.

Massive Software (2009), ‘Homepage’. http://www.massivesoftware.com.

Metoyer, R. A. & Hodgins, J. K. (2004), ‘Reactive pedes-trian path following from examples’, The Visual Com-puter 20, 635–649.

Olsen, D. J., Pitman, N. D., Basakand, S. &Wunsche, B. C. (2011), Sketch-based building mod-elling, in ‘Proceedings of GRAPP 2011’, pp. 119–124. http://www.cs.auckland.ac.nz/˜burkhard/Publications/GRAPP2011_OlsenEtAl.pdf.

Planetside Software, (2006), ‘Terragen’. http://www.planetside.co.uk/terragen/tgd/tg2faq.shtml#faq34.

Press, W. H., Vetterling, W. T., Teukolsky, S. A. & Flan-nery, B. P. (1992), Numerical Recipes in C - The Artof Scientific Computing, 2nd edn, Cambridge Univer-sity Press. http://www.library.cornell.edu/nr/bookcpdf.html.

Reynolds, C. W. (1987), Flocks, herds and schools: A dis-tributed behavioral model, in ‘SIGGRAPH ’87: Pro-ceedings of the 14th annual conference on Computergraphics and interactive techniques’, ACM, New York,NY, USA, pp. 25–34.

Shindler, M. (2008), Approximation algorithms for themetric k-median problem, Technical report, UCLA,Los Angeles, CA. http://cs.ucla.edu/˜shindler/shindler-kMedian-survey.pdf.

Sung, M., Kovar, L. & Gleicher, M. (2005), Fast and accu-rate goal-directed motion synthesis for crowds, in ‘Pro-ceedings of the 2005 ACM SIGGRAPH/Eurographicssymposium on Computer animation (SCA ’05)’, ACM,New York, NY, USA, pp. 291–300.

Szirmay-Kalos, L. (2008), Monte Carlo Methods inGlobal Illumination - Photo-realistic Rendering withRandomization, VDM Verlag, Saarbrücken, Ger-many, Germany.

Treuille, A., Cooper, S. & Popovic, Z. (2006), ‘Continuumcrowds’, ACM Trans. Graph. 25(3), 1160–1168.

van der Linden, J. (2001), Interactive view-dependentpoint cloud rendering, in ‘Proceedings of IVCNZ2001’, pp. 1–6. http://www.cs.auckland.ac.nz/˜jvan006/papers/pointcloudrendering_final.ps.gz.

Witkin, A., Kass, M. & Baraff, D. (1994), An intro-duction in physically based modeling, in ‘SIGGRAPH’94, course notes #32 - An Introduction to PhysicallyBased Modeling’, ACM SIGGRAPH. Held in Orlando,Florida, 24 – 29 July.

Wong, Y. Y. (1992), Rough and ready prototypes: lessonsfrom graphic design, in ‘Posters and short talks of the1992 SIGCHI conference on Human factors in comput-ing systems (CHI ’92)’, ACM, New York, NY, USA,pp. 83–84.

WorldOfPolygons.com (2006), ‘CrowdIT’. http://www.crowdit.worldofpolygons.com/.

Yang, R. & Wunsche, B. C. (2010), LifeSketch - AFramework for Sketch-Based Modelling and Anima-tion of 3D Objects, in ‘Proceedings of the AustralasianUser Interface Conference (AUIC 2010)’, pp. 1–10. http://www.cs.auckland.ac.nz/˜burkhard/Publications/AUIC2010_YangWuensche.pdf.


76

Supporting Freeform Modelling in Spatial Augmented Reality Environmentswith a New Deformable Material

Ewald T.A. Maas1 Michael R. Marner2 Ross T. Smith3 Bruce H. Thomas4

1 Wearable Computer Lab, Advanced Computing Research CentreUniversity of South Australia,

Email: [email protected] Email: [email protected]

3 Email: [email protected] Email: [email protected]

Abstract

This paper describes how a new free-form modelling ma-terial, Quimo (Quick Mock-up), can be used by indus-trial designers in spatial augmented reality environments.Quimo is a white malleable material that can be sculptedand deformed with bare hands into an approximate model.The material is white in colour, retains its shape oncesculpted, and allows for later modification. Projecting im-agery onto the surface of the low-fidelity mock-up allowsfor detailed prototype visualisations to be presented. Thisability allows the designer to create design concept visual-isations and re-configure the physical shape and projectedappearance rapidly.

We detail the construction techniques used to createthe Quimo material and present the modelling techniquesemployed during mock-up creation. We then extend thefunctionality of the material by integrating low-visibilityretro-reflective fiducial markers to capture the surface ge-ometry. The surface tracking allows the combined physi-cal and virtual modelling techniques to be integrated. Thisis advantageous compared to the traditional prototypingprocess that requires a new mock-up to be built whenever asignificant change of the shape or visual appearance is de-sired. We demonstrate that Quimo, augmented with pro-jected imagery, supports interactive changes of an existingprototype concept for advanced visualisation.

Keywords: Spatial Augmented Reality, Industrial Design,Deformable Surface, Quimo.

1 Introduction

This paper describes how our new material, calledQuimo (Maas et al. 2011), is used to support freeformmodelling for industrial designers working in a SpatialAugmented Reality (SAR) (Raskar & Low 2001) environ-ment. This builds on our previous investigations into howSAR can be integrated into the industrial design process(Marner & Thomas 2010, Marner et al. 2009). We havealso been investigating organic interfaces through the useof Digital Foam (Smith et al. 2008a,b). Quimo is a basematerial for prototyping that combines the natural sculpt-ing properties of clay-like substances with intricate detailproperties provided by SAR. Both the clay-like and intri-cate detail properties work in concert to provide real-time

Copyright c©2012, Australian Computer Society, Inc. This paper ap-peared at the 13th Australasian User Interface Conference (AUIC 2012),Melbourne, Australia, January-February 2012. Conferences in Researchand Practice in Information Technology (CRPIT), Vol. 126, HaifengShen and Ross Smith, Ed. Reproduction for academic, not-for-profitpurposes permitted provided this text is included.

physical-plus-virtual feedback to the designer.Quimo (Quick Mock-up) is an innovative free-form

modelling material designed for use with SAR. The keyapplication for Quimo is to support a novel prototyp-ing technique, allowing industrial designers to generatereusable low-fidelity mock-ups early in the design pro-cess. Quimo is a white malleable material that can bemoulded with bare hands to produce low-fidelity phys-ical prototypes (Quimo mock-up shown in Figure 1(a)(right)). Unlike clay, Quimo comes in sheet form allow-ing hollow physical models to be constructed by cuttingand bending the material into shape. Employing SAR toproject imagery onto these low-fidelity mock-ups allowsfor complex surface appearances to be presented. Fig-ure 1(b) shows a high detail physical car model paintedwhite and using SAR to provide the visual coloured ap-pearance details. Figures 1(c) and 1(d) show a physicalcar model sculpted using Quimo; the visual appearance isalso projected using SAR for details. Using our new mod-elling process both the virtual surface appearance and thephysical shape of the mock-up can be independently cus-tomized in real-time. Quimo allows the designer to createconcept visualisations during the early stages of develop-ment that are physically reconfigurable with multiple vi-sual appearances on one physical model.

This paper makes contributions to the following

(a) (b)

(c) (d)

Figure 1: Modelling a Quimo prototype based on a phys-ical model. (a) Comparison between original model andQuimo model. (b) Projected graphics on original MiniCooper model. (c) Projected graphics on Quimo proto-type. (d) Virtual spray paint on quimo prototype.


77

fields: industrial design prototyping, human-computerinteraction, marker-based tracking, and spatial augmentedreality. This paper has the potential for a large impactdue to the fact every commercial product goes throughan industrial design process, and Quimo has the potentialto radically improve the concept development phase ofthat process. In particular the following four concepts arenovel contributions:

1. The creation of a new material, Quimo, which sup-ports free-form modelling and has a suitable SARprojection surface.

2. Extending the concept phase development methodol-ogy of industrial design by allowing the presentationof feature rich information using Quimo and SAR.

3. A reproducible process for constructing the Quimomaterial.

4. The integration of retro-reflective fiducial markersusing glass beads into the surface of the Quimo ma-terial, which allows Quimo’s surface shape to be cap-ture in real-time.

Sections 2 and 3 of this paper begin by presentingthe related work, focusing on SAR and design processmethodologies. In Section 4, we describe the concept de-sign phase in the industrial design process and motivatethe need for Quimo, our new modelling material. Fol-lowing this, Section 5 describes conceptually how a de-signer uses Quimo to create low-fidelity physical mock-ups. Section 6 described three methods of how the low-fidelity mock-up can be transformed into a high-fidelityconcept visualisation using SAR. Following this we de-scribe the technical considerations and implementation as-pects of the Quimo material and the SAR environment.We conclude by summarizing the contributions presentedin this paper.

2 Concept Mock-ups in the Industrial Design process

Designers select a design methodology according to thedesign variables of the task at hand. Pugh’s Total DesignModel (Pugh 1991) and Pahl and Beitz’s model of design(Pahl et al. 1984) are two examples of commonly appliedmethodologies. Whichever design methodology is cho-sen, concepts need to be generated that provide solutionsto the design specification. This phase of design is calledthe concept stage. The typical process during the conceptstage is to: 1) generate ideas, 2) create concepts that em-body the ideas, 3) evaluate the concepts, and 4) select themost promising concepts. Using CAD and design applica-tions to express conceptual ideas is common place. Creat-ing physical mock-ups to assess the viability of a conceptis also a common practice during the industrial design pro-cess. In practice, a combination of materials is needed tocreate mock-ups with diverse features. Since the modelitself is still in the concept stage, the dimensions and ap-pearance of the model are often not well defined. The de-signer explores different materials, colours, textures anddimensions repeatedly to find the best way to visualise aconcept. We are investigating two aspects of the conceptvisualisation process: the techniques and materials used tocreate the physical prototypes, and the procedures used toaugment the visual appearance of the mock-up.

2.1 Visualising a Mock-up

The appearance of the mock-up surface is an importantaspect of the physical mock-up. Designers use paint andinks to colour and annotate their mock-ups. A disadvan-tage with this approach is when changing a design, either

a separate mock-up needs to be constructed or the initialmock-up must be re-painted. Although clay and polymerplasticine are able to change shape continuously, this is notpossible once painted. The painted surfaces become lessmalleable. We consider these to be limitations of the cur-rent practices of mock-up creation in the industrial designprocess.

An alternate approach we are investigating is to useSAR with Physical-Virtual Tools (PVT) to alter the sur-face appearance by projecting virtual textures directlyonto a physical mock-up (Marner & Thomas 2010, Marneret al. 2009). This technology overcomes the second limi-tation by allowing multiple appearances to be projected insuccession onto one mock-up. The designer may digitallypaint directly onto the model by interactively modifyingthe projected texture to alter the appearance. The texturecan be saved and recalled later for further evaluation.

Although features in the concept stage are created, theactual dimensions are often not well defined. Finding thecorrect dimensions, shape, and appearance to visualise aconcept is therefore an iterative process, moving back andforth between visualisations until the result is satisfactory.A material that facilitates this process is considered a re-quirement. As previously mentioned, traditional mock-up materials and techniques require a new mock-up to bebuilt after the shape is significantly changed or the visualappearance is changed, adding to the total time and mate-rial involved in the concept stage.

SAR can also leverage any preliminary conceptsketches, textures and design ideas that are created in theearly concept stage. SAR provides the ability not onlyto generate a new appearance, but also present the initialideas by projecting them directly onto the physical mod-els. This functionality is inherently provided by the SARsystem and intended to maximize the visual informationand presentation to the designer.

2.2 Current Mock-up Materials

Industrial designers can utilize several techniques for con-structing a physical mock-up and visualising a concept. Inthis section we examine three major materials and tech-niques: rigid materials, 3D printing, and malleable ma-terials. With the exception of very high end colour en-abled 3D printers, these techniques are used to constructthe physical mock-up only, and do not focus on the ap-pearance. Mock-ups need to be painted to provide a finelydetailed surface appearance.

Rigid materials such as urethane foam, cardboard,timber, and plastics allow for the quick creation of mock-ups. The level of detail is limited by the skill of the de-signer and the time spent on the creation of the mock-up.Although the level of detail of the mock-up may be quitehigh, the shape of the models are difficult to change oncecreated.

3D printing technology is now available to industrialdesigners for use in mock-up construction. The mock-updesign is expressed using CAD software. The design isthen printed to create a physical object. 3D printers area very powerful tool for creating high fidelity mock-ups,however there are a number of limitations. The rigid na-ture of the material limits physical reworking. Reprintingdesigns in order to correct mistakes is a common occur-rence. A second limitation is the long printing time, whichrequires hours before the design can be assessed and fur-ther reworked. Finally, CAD modelling is typically lim-ited to keyboard and mouse interaction. The designer isunable to feel the properties of the material during devel-opment. By removing tactile feedback, designers createtheir digital models based only on their familiarity withthe physical materials. In particular modelling of organicshapes without tactile feedback is difficult, especially dur-


78

ing the concept stage where dimensions are often not welldefined.

Malleable materials such as clay and polymer plas-ticine are well suited for the creation of organic shapesby employing free-form modelling techniques. Designersmanipulate these materials with their fingers to form themock-up. The flexibility of the material allows the de-signer to change the shape of the model after its initialcreation. Clay and polymer plasticine overcome the prob-lems of remodelling a design that are apparent in the pre-viously mentioned materials and techniques. A drawbackto using clay or polymer plasticine is that it is impossibleto change the shape after colours have been applied. Inaddition, clay and polymer plasticine both suffer from alack of structural strength, limiting their ability to createlarge flat suspended sheets or large hollow objects. Thedisadvantage to this is large clay models are particularlyheavy and not appropriate for many prototype mock-ups.To overcome this, a common practice is to use them inconjunction with cardboard or timber to create the under-lying structure which the clay or polymer plasticine is ap-plied to. However, combining materials further limits theextent to which the shape can be changed afterwards mak-ing it difficult to iterate the design without constructing anew model. Combining materials may also cause prob-lems with producing a finish on the prototype that is con-sistent across the different materials that are used.

3 Related Work

Research in the field of industrial design has capturedcommonly used techniques and structured their flowinto methodologies that assist designers in structuringthe product development (Andreasen & Hein 1987, Pahlet al. 1984, Roozenburg & Eekels 1995, Ulrich & Ep-pinger 1995). There are differences between these designmethodologies, but a common step is that promising con-cepts meeting the design specification will be generatedin the concept stage. The concept phase is where the ideabecomes a tangible reality with the first construction of aprototype. These mock-ups allow initial evaluation of thedesign to be performed (Wall et al. 1992).

Augmented Reality (AR) has shown potential to en-hance product design, construction, and evaluation for in-dustrial designers. Wang et. al (Wang et al. 2009) de-scribe an AR system for the design and evaluation of func-tional assemblies. AR has been used for industrial build-ing acceptance tasks (Schkolne et al. 2001), and for com-paring automotive parts with the reference designs (Nolle& Klinker 2006). AR is also used in the design process;for example Augmented Foam (Lee & Park 2005) uses ahead worn display to overlay material properties on a foammock-up of a product.

We are particularly interested in SAR (Raskar & Low2001), where perspectively correct computer graphics areadded to surfaces using projectors placed in the environ-ment. SAR is useful to industrial design, compared toother AR display technologies, as the designer is not re-quired to wear or carry equipment. SAR requires physicalsurfaces to project onto, which can be readily found in de-sign prototypes.

SAR has been used to digitally paint onto physical ob-jects (Bandyopadhyay et al. 2001, Marner et al. 2009).WARP (Verlinden et al. 2003) projects onto rapid proto-type models, allowing a designer to preview the appear-ance of different materials. Piper et. al (Piper et al.2002) describe a projector based system where the usercan sculpt and analyze landscape forms using clay. OurAugmented Foam Sculpting (Marner & Thomas 2010) al-lows a designer to produce 3D virtual models by subtrac-tively sculpting from foam.

Interacting with physical objects is the goal of tangi-ble user interfaces (Fitzmaurice et al. 1995), and has beenshown to enhance user interaction (Hoffman et al. 1998,Ware & Rose 1999). Our Quimo prototype draws fromthis concept by providing a tangible material that is incor-porated directly with projected information.

Our investigations are also concerned with the appli-cation of tracking technologies for AR. A well employedapproach is to use black and white fiducial markers suchas those used with ARToolkit to provide a 6DOF trackedlocation. A number of researchers have extended the im-plementation to hide the visibility of the fiducial mark-ers. Invisible markers have been created out of transparentretro-reflective tape (Nakazato et al. 2008, 2005a,b), paint(Nota & Kono 2008) and IR fluorescent ink (Park & Park2004). Each of these provide a fixed shape marker whichcan be grouped for capturing a simple surface.

There has been research with applying ARTags (Fiala2005a) to capture organic shapes with multiple markers(Fiala 2005b). Our work differs in that the organic shapecan change in real-time. There have also been investiga-tions into flexible fiducal markers (Pilet et al. 2005, 2008),and this work focused on tracking a number of markersthat follow the contour of a non-planer shape. Unlike thesesystems, the tracking in this paper assumes the materialthat the markers are printed on can be cut.

4 Quimo in the Design Process

This paper explores a new mock-up material and the corre-sponding techniques employed during mock-up creation.Our primary goal has been to explore how the conceptphase of the modelling methodology can be enriched byusing SAR technologies to allow designers to visualisetheir concepts with higher detail and be provided with amore flexible modelling environment. The existing indus-trial design concept phase modelling process is describedin Figure 2(a). We can see that significant design changesrequire a new prototype to be constructed. In compari-son, when using Quimo and SAR for concept develop-ment both the physical model and appearance can be al-tered (Figure 2(b)) without requiring a new prototype dobe constructed. Table 1 provides a feature summary thathighlights the significant features we are investigating forfree-form prototype development. Here we have identifiedhow our approach provides benefits of existing techniquesthat are specifically relevant for the concept developmentphase.

5 Creating Mock-ups with Quimo

This section describes the Quimo material and how it canbe used by designers to construct physical models beforeprojected images are used to change the appearance in aSAR environment.

Quimo is constructed using a sheet of aluminium meshwire coated with white silicone to create a hybrid mod-

Table 1: Features of material modelling

Technique Malleable Dur-ing Modelling

Malleable afterAppearanceApplied

AppearanceSwapping

3D Printer No No No (paint)Yes (SAR)

Rigid Materials No No No (paint)Yes (SAR)

Malleable Materials Yes No No (paint)Yes (SAR)

Quimo with SAR Yes Yes Yes


79

(a)

(b)

Figure 2: Concept modelling process: (a) Traditionalmock-up flow (b) Updated mock-up flow using Quimo andSAR for modelling.

elling surface. The mesh wire is used for its shape preserv-ing property; the white silicone coating is malleable andcan be moulded into shapes with a smooth finish whichmakes Quimo a good projection surface.

Modelling with Quimo involves three basic processes;cutting, bonding and sculpting. Cutting Quimo is per-formed using regular scissors. This allows both curvedand straight cuts to be easily performed. An example ofcutting Quimo with scissors is shown in Figure 3(a).

The silicone material repels most types of glues andtapes, so we have looked into techniques for bonding thematerial. When joining two pieces of material together,we found stapling pieces to be a robust technique, sincethe staples wrap around the inner aluminium mesh (seeFigure 3(b)). Another successful approach for tight cor-ners is the use of cable ties (tie-wraps) to join pieces to-gether (shown in Figures 3(c)) and 3(d). It is also possibleto glue two pieces of Quimo together by using liquid sili-cone as a bonding material.

Sculpting the shape of Quimo can be easily achievedusing either bare hands or tools. A bare hand techniquethat can be used is draping the material over an object andforming around it with your hands (see Figures 3(e) and3(f)). Pinching the material allows a ridge to form that isuseful for building up features (shown in Figures 3(g)). Aswith clay sculpting, tools are also very effective in shap-ing Quimo. A ruler can be used to create straight foldsby bending it around the ruler’s edge (Figure 3(h)). Wealso noticed that when handling Quimo, talcum powdercan be used to prevent dust or dirt from sticking to thesilicone surface. Since the talcum powder is white it isalso suitable for projected images and slightly reduces thereflective surface of the silicon.

6 Three Modes of Modelling with Quimo

This section describes three methods of modelling withthe Quimo material. The first demonstrates how a designercan construct a physical mock-up, and use a predefinedvirtual graphical model for projected graphics. The sec-ond method demonstrates how the designer can use PVTto paint projected imagery onto the physical Quimo pro-totype without a predefined virtual model. Finally, wedescribe how the embedded fiducial markers are used ina real-time sculpting and painting process allowing both

(a) (b)

(c) (d)

(e) (f)

(g) (h)

Figure 3: Modelling techniques used to form Quimo. (a)Cutting Quimo. (b) Bonding Quimo with staples. (c) Us-ing cable-ties to join the material. (d) Result of a cable-tiejoin. (e) Draping Quimo over a hand. (f) Draping formedmaterial into hand shape. (g) Pinching to create feature.(h) Employing a ruler to create a crease.

physical material sculpting and virtual painting to be per-formed for concept creation.

6.1 Modelling with Quimo and a Predefined VirtualModel

This method allows a high-fidelity mock-up to be createdby using Quimo in conjunction with projected imageryof a predefined virtual graphical object. Our demonstra-tion consists of a physical and virtual model with the ap-pearance of a Mini Cooper (shown in Figures 1(a) and1(b)). The Mini Cooper models were reused from a pre-vious SAR study (Marner et al. 2009). The physicalMini Cooper model (Figure 1(a) (left)) was replaced witha sculpted Quimo version (seen in Figure 1(a) (right)).During sculpting we gained a greater understanding ofhow the material properties affect the sculpting process.Since we are interested in creating an approximate modelquickly to simulate the design process in the concept stagewe did not measure the exact dimensions. Instead, theshape was estimated and continually changed until the de-sired form was obtained. This process required constantcomparison between the original physical model and onesculpted from Quimo. Once an approximate model is ob-tained, the digital appearance of the virtual model is ap-plied using projected SAR images. Since the original Mini


80

(a) (b)

Figure 4: Deforming the Quimo material with projectedimages attached to markers. (a) Sheet of Quimo with pro-jected images. (b) Folding material into a cube shape.

Cooper model and the one constructed from Quimo havedifferent dimensions, we also applied the existing SARcalibration technique to optimize the alignment of the pro-jections. This process requires the user to select a seriesof predefined points using a cursor. Figure 1(b) showsthe projected imagery on the original Mini Cooper model.Figure 1(c) demonstrates how a low-fidelity mock-up cre-ated out of Quimo is transformed into a high-fidelity rep-resentation by leveraging the projected imagery.

6.2 Painting on Quimo Prototypes Without a VirtualModel

Projected imagery can still be used on a Quimo mock-up,even without an accurate, underlying virtual model. Digi-tal Airbrushing (Marner et al. 2009) with PVT can be usedto interactively create different appearances on the mock-up. An example of this technique is shown in Figure 1(d),where the surface of the Mini has been sprayed with vir-tual spray paint. This was accomplished using a boundingbox as a stand-in for accurate virtual geometry. As the“paint” is applied using a projector, it appears to fall at thecorrect locations on the mock-up. Although this visualisa-tion application supports the designer in changing the pro-jected imagery iteratively, the model is required to remainstationary and in the same shape for the visualisations toremain accurate.

6.3 Simultaneous Physical and Virtual Modelling

In the first mode we have made the assumption a virtualmodel with the correct visual appearance exists before thephysical material is sculpted. During concept visualisa-tion a virtual model may not be available, to address thissituation we have considered how the Quimo material canbe used in conjunction with projected graphics when nopre-existing virtual model is available. Our example alsoconsiders how the material can be iteratively deformed,digitally painted and re-deformed while maintaining a cor-rect registered projected appearance. We have addressedboth these aspects by incorporating a customized trackingsolution into the Quimo material. Our tracking solutionemploys low-visibility markers that are integrated with thematerial surface of Quimo. The location of each marker istracked and used to update the digital information to com-pensate for the physical deformations.

For this example, we have created a 300mm x 400mmsheet of Quimo with twelve low visibility markers inte-grated into the surface. The markers are arranged in a 4x 3 grid so a simple cube can be constructed. The de-signer starts the modelling process by digitally sprayinga coloured appearance onto the surface. Using scissors,the designer cuts away the unwanted sections of the mate-rial allowing the material to be folded into a cube shape.During the folding process, the location of the markerschanges from being on a planar surface to the faces of thecube. The projected graphics that were initially sprayed

onto the surface are now correctly displayed on the facesof the cube. The steps of this cube example are shownin Figure 4. It is possible to erase and change the virtualannotations or to save and switch between virtual paintappearances.

Further design alterations are also possible using dig-ital paint and textures with PVT to further annotate themodel. One advantage of this approach is the process ofchanging the visual appearance and the shape of the ob-ject does not require the designer to build additional pro-totypes as is often required when using traditional mate-rials. The goal here is to reduce the time and materialneeded during concept visualisation while also increasingthe functionality to maximize the visualisation possibili-ties.

With markers integrated in the surface we have beenable to maintain the ability to cut, and bend the materialinto the desired shape. In the current implementation thedesigner is limited to bend and cut the material only onthe edges between the markers to prevent destroying themarker and compromising its tracking. When the resolu-tion of the camera increases and the algorithms used totrack a bent marker are improved, the size of the markerscan become smaller, allowing for a finer deformation ofthe material.

7 Surface Capture Techniques

Capturing the geometry of physical objects in real-timeis a difficult problem. Several existing technologies wereconsidered for implementing a tracked Quimo surface.However, we have been unable to find one solution thatmeets all of our requirements. This section describes ourrequirements and the path we followed to achieve a mod-ified tracking system that makes it possible to project reg-istered images on a deformable surface.

The following is a list of requirements we identified inorder to determine what technique to use or develop forsurface capturing:

1. The designers need to be able to cut and bend thematerial by using a pair of scissors.

2. The outside surface area is white. Any tracking of theshape and position may not interfere with the ability

(a) (b)

(c) (d)

Figure 5: (a) Traditional ARToolkit fiducial markersprinted in black and white. (b) Quimo material with em-bedded low visibility IR markers shown with IR light. (c)Virtual cubes being projected directly onto the Quimo ma-terial. (d) Quimo material shown as seen by the human eye(without IR light).


81

to project accurate colours and textures.

3. Real-time tracking of position and shape.

We observed the following limitations of existing tech-nologies for our application. Electronics can not be easilyintegrated into the material since it will be cut regularlyduring use. This limits the use of embedded touch screensin the material. Commonly employed fiducial markers,such as those used in ARToolKit, provide a promising so-lution. However in their unmodified form the black andwhite markers compromise the ability to project onto thatsurface. Structured light and laser scanners can be used tocapture the shape of an object, but are unsuitable when theobject is being deformed in real-time.

To overcome these boundaries we have integrated low-visibility fiducial markers into the Quimo material. Low-visibility markers can be constructed using infra-red or ul-traviolet light sensitive materials.

Low visibility IR and UV markers are made up of amaterial that fluoresces in the IR or UV spectrum. A cam-era that is sensitive to these wavelengths can then distin-guish the markers that are not visible to the human eye.Since UV light can cause health problems we have cho-sen to use IR for our implementation. However, IR mark-ers are normally made out of paint or tape which can notbe applied to the Quimo surface. We developed a noveltechnique for implementing the IR markers on the Quimosurface.

We incorporate a pattern of glass beads into the siliconmaterial while it is curing. The glass beads create a retro-reflective surface, where incident light is reflected backtowards the light source. The markers are identified us-ing a camera with an IR light source mounted close to thecamera lens. Figure 5(d) shows how the markers appearunder visible light and Figure 5(b) demonstrates the viewunder IR light.

8 System Implementation Details

This section describes three aspects of the Quimo systemimplementation. Firstly, we describe how the physical-virtual SAR tools are used to provide virtual spray paint-ing on Quimo. We then discuss the tracking solution andhow it was applied to create a deformable surface fromgrouped fiducial markers. Following this, we describe thehardware details of our SAR environment.

8.1 Physical-Virtual Tools

We have previously demonstrated how physical objectscan be augmented with digital paint using our Physical-Virtual Tools (Marner et al. 2009). A major limitation ofthis and other SAR systems is the need for 3D virtual ge-ometry to represent the physical objects being projectedonto. To combine free-form sculpting using Quimo anddigital airbrushing, we have extended our airbrushing sys-tem (Marner et al. 2009).

We expanded the previous implementation to pro-vide support for Quimo modelling. In the update imple-mentation physical objects are digitally painted using abounding-cube texture map. The physical object is placedinside this volume, and the designer paints as normal. Thenature of projected light ensures the virtual paint is placedin the correct location on the object, even without a corre-sponding virtual model. Figure 6 shows how the texturedbounding box, when drawn from the point of view of theprojector, correctly maps digital paint onto the mock-up.This works well when the bounding box closely fits thesculpted model, but projections would not align as well ifthe bounding box was significantly larger than the model.Figure 1(d) shows the result of airbrushing on the car pro-totype. Using the same cube map texture, the paint could

Figure 6: Left: The virtual bounding box, drawn from thepoint of view of the projector. The location of the physicalmock-up is shown inside the bounding box. Right: Theprojected textures map correctly onto the physical mock-up.

also be applied to a CAD model of the design at a laterstage of development. However, the goal of this techniqueis to preview different material properties early in the de-sign process, before CAD models have been created.

8.2 Deformable Surface Implementation

The embedded fiducial markers are used to support morecomplex interactions. At the simplest level, Quimo de-signs are tracked, allowing the projection to be updatedas the design is moved. A benefit of embedding fiducialmarkers into the material is the shape of the surface canbe modelled. This provides a more accurate projection. Inthe case of the airbrushing tool, the digital paint is regis-tered to the physical surface rather than the bounding boxregion, even as the material is bent, cut, and sculpted .

A naive approach to implementing the deformable sur-face would be to simply treat each fiducial marker as a tilethat can be painted. However, the reliability of the track-ing is reduced by the low contrast in the images obtainedfrom the camera using the embedded markers. This is ex-acerbated by the user obscuring the camera’s view of themarkers with their hands or tools. This results in a pro-jected image that flickers and is unusable to the designer.

To overcome this, we treat the entire sheet as a de-formable polygon mesh, with vertices placed betweeneach fiducial marker. The position of each vertex is deter-mined by its neighboring markers. During tracker updates,each visible marker contributes a position for each vertexit affects. These positions are then averaged for each ver-tex to obtain its final location. This approach means thereare four markers that can contribute to each of the innervertices, two markers for each edge vertex, with the cor-ners the only vertices without this redundancy. If we con-strain the types of deformation that can be performed, wecan add redundancy to these markers using more distantneighbors. Figure 4 illustrates our deformable surface im-plementation.

8.3 SAR Environment

Our environment is based on a growing framework de-signed to support large scale, interactive SAR systems.This framework is written in C++, using OpenGL for 3Dgraphics. The applications described in this paper employa generic desktop PC with Nvidia Quadro 3800 graphicshardware. Two projectors are used, each at a resolution


82

of 1280x800. The PVT are tracked using a Polhemus Pa-triot1 magnetic tracking system. The embedded markerson the Quimo material are tracked using a Sony XCD-X710CR Firewire camera and the ARToolkitPlus (Wagner& Schmalsteig 2007) tracking library.

The projector calibration algorithm we employed is de-scribed by Bimber and Raskar (Bimber & Raskar 2005),and involves manually finding known 3D points in the pro-jector image. Details for the digital airbrushing algorithmhave been described in our previous paper (Marner et al.2009).

9 Quimo Material Implementation

We have previously described the materials used to con-struct Quimo (Maas et al. 2011). This section describesthe techniques used to actually build a sheet of Quimo.Quimo is constructed using two materials: Smooth-OnEcoFlex 302 and Amaco Wireform (aluminium meshwire). The mesh wire is regularly used for freeform mod-elling, but lacks the white surface that is required for SAR.EcoFlex 30 fulfills this requirement. It is lightweight, willcure into a smooth surface area and can be stretched atleast as much as the meshwire can. It also allows for inte-grating the markers in the material.

A variety of construction techniques were explored tocoat a sheet of meshwire with the Ecoflex material. Ourbest results were achieved by using a three-layer approach.Figure 7 illustrates the layered structure of Quimo.

First, two flat sheets of EcoFlex are created, by pour-ing the Ecoflex liquid in an open topped box of the desireddimensions (shown in Figure 8(a)). The height of the boxdefines the thickness of the sheet. A thickness of 1.3 mmper sheet balances meshwire visibility, weight and flexibil-ity. After mixing the two components of Ecoflex it takesapproximately 4 hours to cure. It is then possible to workwith the created sheets.

Next, liquid Ecoflex is added on top of one of the twosheets and distributed evenly (Figure 8(e)). A flat sheetof meshwire is then added to this liquid layer of Ecoflex(Figure 8(f)). The liquid layer of Ecoflex and the mesh-wire will create the middle layer of the material. The toplayer of Quimo is made with the second sheet of curedEcoflex that was created before. The liquid Ecoflex in themiddle layer will bond all pieces together. To squeeze outexcess liquid we added weight on the top layer. After an-other 4 hours the middle layer will be cured and the Quimomaterial is ready for use.

9.1 Embedding Low Visibility Fiducial Markers

The description above creates the material itself, it doesnot yet incorporate the low-visibiliy fiducial markers. Theglass beads that make up the markers can be added duringor after the creation of Quimo.

Adding the glass beads during the creation of Quimowill result in markers that are integrated in the materialand are difficult to remove. However the glass spheresbecome partially submerged in the silicon, which reducesthe retro-reflectivity.

Adding the glass beads after the creation of the Quimomaterial will result in beads that are stuck on top of thematerial. The problem with this approach is the glassbeads can be rubbed off with firm finger pressure, how-ever the retro-reflectivity property is well maintained withthis process.

The glass beads have to be added to one of the two ini-tial silicon sheets when it is still curing. We wait 45 min-utes after mixing the liquid silicone starts to change from

1http://www.polhemus.com2http://www.smooth-on.com

Figure 7: Cutaway showing the layers that compriseQuimo. The layers are bonded while the center Ecoflexlayer is liquid.

liquid to solid. Add the beads too early and they are ab-sorbed and lose all retro-reflectivity, add them too late andthey stick on top of the Quimo, which makes them easyto rub off. The correct timing is determined by testingthe viscosity as shown in Figure 8(b). When it no longersticks, the template and the beads can be added. In addi-tion to correct timing, the beads need to form a recogniz-able marker pattern that can be used with ARToolkit+. Weused a 3D printer (Dimensions uPrint plus3) to constructa template which was used to form the marker shape. Thetemplate can be removed once the material has cured.

10 Limitations

There are a few limitations that we have observed duringour investigations. Firstly, during sculpting we noticed theembedded aluminium mesh wire experiences some metalfatigue. Bending Quimo back and forth in the same placewill result in the wires breaking. When this occurs, it isnot noticed on the outside of the material since it is stillcovered in silicone, but the broken mesh wire does reduceQuimo’s ability to retain its shape.

Our tracking solution uses a grid of ARToolkit markersto allow the surface shape to be captured. This is intendedto demonstrate the feasibility of embedding low-visibilityfiducial markers into the Quimo material. A more ad-vanced approach would be to use random dot mark-ers (Uchiyama & Saito 2011), which have been extendedto support tracking of deformable surfaces (Uchiyama &Marchand 2011).

11 Conclusion and Future Work

In this paper we have presented Quimo, a malleable mod-elling material allowing traditional hand modelling andprojected digital imagery to be combined. We have de-scribed how the concept design phase, used by industrialdesigners, can be enriched using Quimo and SAR allow-ing for continuous changes of the visual appearance andshape without the need to build new mock-ups. We alsodescribe the interaction techniques we explored by creat-ing two physical prototypes to demonstrate tracked virtualprojections on Quimo’s surface. The final technique pre-sented allows a user to deform the shape of the materialwhile the digital projected graphics are updated to main-tain the correct visual appearance. Finally, the implemen-

3http://www.dimensionprinting.com


83

(a) (b)

(c) (d)

(e) (f)

Figure 8: Material implementation. (a) Mould used toform the silicon sheets of the Quimo material. (b) Cor-rect viscosity reached before adding glass beads for mark-ers. (c) Adding glass beads using a template to formmarker shapes. (d) Removing the template after curing.(e) Adding a layer of Ecoflex to join the sheets. (f) Addthe aluminium mesh wire layer, to provide rigid propertyand add the top sheet.

tations details of Quimo are presented to describe how thematerial is fabricated and how retro-reflective glass beadsare embedded directly into the silicon material to createfiducial markers.

The primary use of Quimo is for creating mock-upsin the concept design phase of the industrial design pro-cess. However, since Quimo allows for continuous sur-face capture and reconstruction, it is possible to save theentire process of mock-up creation. A designer can thenimprove their skills by watching the process in retrospect.Saving the process of mock-up creation also allows for thecomparison of the design process between novices and ex-perts. The difference can then result in instructional guide-lines. Additionally, the virtual models obtained from thedeformed Quimo surface can be saved and imported intoCAD software for later use in the design process. Thisreduces the effort involved in recreating concepts in CADsoftware that have already been evaluated with Quimo.

References

Andreasen, M. M. & Hein, L. (1987), Integrated ProductDevelopment, IFS Publications, Springer, NY.

Bandyopadhyay, D., Raskar, R. & Fuchs, H. (2001), Dy-namic shader lamps: Painting on movable objects, in‘IEEE and ACM International Symposium on Mixedand Augmented Reality’, pp. 207–216.

Bimber, O. & Raskar, R. (2005), Spatial Augmented Re-ality: Merging Real and Virtual Worlds, A K Peters,Wellesley.

Fiala, M. (2005a), Artag, a fiducial marker system usingdigital techniques, in ‘Proceedings of the 2005 IEEEComputer Society Conference on Computer Vision andPattern Recognition (CVPR’05) - Volume 2 - Volume02’, CVPR ’05, IEEE Computer Society, Washington,DC, USA, pp. 590–596.

Fiala, M. (2005b), The squash 1000 tangible user interfacesystem, in ‘Proceedings of the 4th IEEE/ACM Interna-tional Symposium on Mixed and Augmented Reality’,ISMAR ’05, IEEE Computer Society, Washington, DC,USA, pp. 180–181.

Fitzmaurice, G. W., Ishii, H. & Buxton, W. A. S.(1995), Bricks: Laying the foundations for graspableuser interfaces, in ‘Proceedings of the SIGCHI confer-ence on Human factors in computing systems’, ACMPress/Addison-Wesley Publishing Co., Denver, Col-orado, United States, pp. 442–449.

Hoffman, H., Hollander, A., Schroder, K., Rousseau,S. & Furness, T. (1998), ‘Physically touching andtasting virtual objects enhances the realism of vir-tual experiences’, Virtual Reality 3(4), 226–234.10.1007/BF01408703.

Lee, W. & Park, J. (2005), Augmented foam: a tangi-ble augmented reality for product design, in P. Jun,ed., ‘Mixed and Augmented Reality, 2005. Proceed-ings. Fourth IEEE and ACM International Symposiumon’, pp. 106–109.

Maas, E., Marner, M. R., Smith, R. T. & Thomas, B. H.(2011), Quimo: A deformable material to supportfreeform modeling in spatial augmented reality envi-ronments, in ‘Poster Sessions: Proceedings of the IEEESymposium on 3D User Interfaces’, Singapore.

Marner, M. R. & Thomas, B. H. (2010), Augmented foamsculpting for capturing 3D models, in ‘IEEE Sympo-sium on 3D User Interfaces’, Waltham Massachusetts,USA.

Marner, M. R., Thomas, B. H. & Sandor, C. (2009),Physical-Virtual tools for spatial augmented reality userinterfaces, in ‘International Symposium on Mixed andAugmented Reality’, Orlando, Florida.

Nakazato, M., Kanbara, M. & Yokoya, N. (2008), Local-ization system for large indoor environments using in-visible markers, in ‘ACM Symposium on Virtual Real-ity Software and Technology’, pp. 295–296.

Nakazato, Y., Kanbara, M. & Yokoya, N. (2005a), Local-ization of wearable users using invisible retro-reflectivemarkers and an IR camera, in ‘Proc. SPIE ElectronicImaging’, Vol. 5664, pp. 1234–1242.

Nakazato, Y., Kanbara, M. & Yokoya, N. (2005b), Wear-able augmented reality system using invisible visualmarkers and an IR camera, in ‘IEEE International SYm-posium on Wearable Computers’, pp. 198–199.

Nolle, S. & Klinker, G. (2006), ‘Augmented reality asa comparison tool in automotive industry’, Mixed andAugmented Reality, 2006. ISMAR 2006. IEEE/ACM In-ternational Symposium on pp. 249–250. article.

Nota, Y. & Kono, Y. (2008), Augmenting real world ob-jects by detecting invisible visual markers, in ‘UIST’.

Pahl, G., Beitz, W. & Wallace, K. (1984), Engineeringdesign: A systematic approach, Springer.

Park, H. & Park, J. (2004), Invisible marker tracking for ar,in ‘International symposium on mixed and augmentedreality’, pp. 272–273.


84

Pilet, J., Lepetit, V. & Fua, P. (2005), Augmenting de-formable objects in real-time, in ‘Proceedings of the4th IEEE/ACM International Symposium on Mixed andAugmented Reality’, ISMAR ’05, IEEE Computer So-ciety, Washington, DC, USA, pp. 134–137.

Pilet, J., Lepetit, V. & Fua, P. (2008), ‘Fast non-rigid sur-face detection, registration and realistic augmentation’,Int. J. Comput. Vision 76, 109–122.

Piper, B., Ratti, C. & Ishii, H. (2002), Illuminating clay:a 3-D tangible interface for landscape analysis, in ‘Pro-ceedings of the SIGCHI conference on Human factorsin computing systems: Changing our world, chang-ing ourselves’, ACM, Minneapolis, Minnesota, USA,pp. 355–362.

Pugh, S. (1991), Total Design: integrated methods for suc-cessful product engineering, Addison-Wesley.

Raskar, R. & Low, K. (2001), Interacting with spatiallyaugmented reality, in ‘Proceedings of the 1st interna-tional conference on Computer graphics, virtual real-ity and visualisation’, ACM, Camps Bay, Cape Town,South Africa, pp. 101–108.

Roozenburg, N. F. M. & Eekels, J. (1995), Product design:Fundamentals and methods, Wiley, NY.

Schkolne, S., Pruett, M. & Schrder, P. (2001), Surfacedrawing: creating organic 3D shapes with the hand andtangible tools, in ‘Proceedings of the SIGCHI confer-ence on Human factors in computing systems’, ACM,Seattle, Washington, United States, pp. 261–268.

Smith, R. T., Thomas, B. H. & Piekarski, W. (2008a),Digital foam interaction techniques for 3D modeling, in‘VRST ’08: Proceedings of the 2008 ACM symposiumon Virtual reality software and technology’, Bordeaux,France, pp. 61–68.

Smith, R. T., Thomas, B. H. & Piekarski, W. (2008b),Tech note: Digital foam, in ‘IEEE Symposium on 3DUser Interfaces’, Reno, NV, pp. 35–38.

Uchiyama, H. & Marchand, E. (2011), Deformable ran-dom dot markers, in ‘Poster Sessions: Proceedings ofthe 10th IEEE International Symposium on Mixed andAugmented Reality’, Switzerland.

Uchiyama, H. & Saito, H. (2011), Random dot markers,in ‘2011 IEEE Virtual Reality Conference (VR)’, IEEE,pp. 35–38.

Ulrich, K. T. & Eppinger, S. D. (1995), Product Designand Development, McGraw-Hill, NY.

Verlinden, J., de Smit, A., Peeters, A. & van Gelderen,M. (2003), ‘Development of a flexible augmented pro-totyping system’, Journal of WSCG .

Wagner, D. & Schmalsteig, D. (2007), ARToolKitPlus forpose tracking on mobile devices, in ‘Proceedings of12th Computer Vision Winter Workshop (CVWW’07)’.

Wall, M. B., Ulrich, K. T. & Flowers, W. C. (1992), Eval-uating prototyping technologies for product design, in‘Research in Engineering Design’, Vol. 3, pp. 163–177.

Wang, Z., Shen, Y., Ong, S. K. & Nee, A. Y. (2009), As-sembly design and evaluation based on Bare-Hand in-teraction in an augmented reality environment, in ‘Cy-berWorlds, 2009. CW ’09. International Conferenceon’, pp. 21 –28.

Ware, C. & Rose, J. (1999), ‘Rotating virtual objectswith real handles’, ACM Trans. Comput.-Hum. Interact.6(2), 162–180. 319102.


85


86

Contributed Posters


87


88

Service History: The Challenge of the ‘Back button’in Mobile Context-aware Systems

Annika Hinze1 Knut Muller1,2 George Buchanan3

1 University of Waikato, Hamilton, New ZealandEmail: [email protected]

2 Humboldt University Berlin, GermanyEmail: [email protected]

3 City University London, United KingdomEmail: [email protected]

Introduction This paper discusses the challenge ofproviding effective interaction for navigating a user’sbrowsing history in context-aware mobile services.Mobile systems are often composed of a number ofservices, and navigation must be understood both interms of a single service and of movement betweendifferent services over time. The semantics of sim-ple navigational steps such as “back” and “forward”becomes much more complex than on traditional in-terfaces, and there is also a need to understand thedifference between navigation based on the user’s ac-tual physical context (i.e. their real location) andexploratory navigation at a different virtual site. Thechallenges of composition and geography both needto be effectively addressed to build a complete andusable history of a user’s interaction with the system.

Interaction between collaborative servicesThere are two established methods for keeping trackof a user’s activity history. (1) For most desktop soft-ware, the history is understood to be the sequence ofsystem states that occurred in their interaction. (2)Web browsers include tabs and other props, but keepto the tradition of each document view (tab, or win-dow) maintaining an independent history. We foundthat familiar expectations may be misleading whenmobile systems use collaborative context-aware ser-vices.(Hinze et al. 2011) We have shown that the tab-based solution of independent histories, commonlyfound on web browsers, is inadequate in a service con-text. This is because changes between services areignored and no overall history would exist. The his-tory solution (undo/redo) of a stand-alone, discretesoftware is also not sufficient as the existence of dif-ferent services is ignored and one single history doesnot allow distinction between services and user con-text. Similar to the motivation for the iPod wheelaccess, we believe that no history entry should everbe deleted: users need to feel certain that they canbring up a previously seen page.

We aim to design a history and back-button be-haviour that allows access to all previously seen pages.We need to distinguish how a user accessed the page(e.g., direct physical visit to a sight vs. only browsingto it). User context, such as movements in the realworld and their location need to be considered.Copyright c©2012, Australian Computer Society, Inc. This pa-per appeared at the 13th Australasian User Interface Confer-ence (AUIC 2012), Melbourne, Australia, January-February2012. Conferences in Research and Practice in InformationTechnology (CRPIT), Vol. 126, Haifeng Shen and Ross Smith,Ed. Reproduction for academic, not-for-profit purposes per-mitted provided this text is included.

Contextual Histories Histories depending on agiven context have been previously suggested (e.g., inan electronic whiteboard(Igarashi et al. 2000), Green-berg’s browser history(Kaasten & Greenberg 2001)and the history tree plug-in). The electronic white-board is the only system that records all user inter-action. However, it does not need explicit interac-tions to change context/focus as all information isavailable at one glance. Browser-based approacheshave to deal with overlap in tabs and windows – ex-plicit change of focus is necessary but not recorded inany of the histories. The back/undo buttons refer tothe current context of the tab or whiteboard section.The whiteboard also supports an overall undo/redo.The history tree works within traditional Firefox –back/forward buttons refer to the current tab. Thetree refers to each window. The whiteboard is theonly solution with a tree that covers all sections. Dis-tinction between location change and browsing doesnot exist in any of the systems. Context has only twodimensions: time (order) and location (tab/windowsor section). Services may be seen as tabs, windowsor sections. Additional contexts such as physical lo-cation as well as interactions for changing focus (e.g.,switching tabs) have to be included.

We require a history that keeps all information(such as the history tree within each window) butgives layered access such as for the whiteboard. Thelimited size of a mobile screen reduces the useability ofnavigating large history trees. Strictly separate histo-ries for each service (as in Greenberg’s approach) arenot sufficient. Information about physical locationadds yet another dimension to the overall history.

History design We now identify implications ofmodes and contexts on the history and navigationelements (back/forward). We propose to keep onehistory for all interactions of the user and to definecontext-based views onto this history. The conceptof views onto complex data is borrowed from datawarehouses, whereas principles of drilling down intocertain aspects of data (context) are taken from datawarehouses. This implies that no elements should bedeleted from the history but only rendered invisiblefor a certain view. Each view may contain one or sev-eral contexts for selecting and sorting of the history.We first identify the contexts for user interaction anddisplay and the challenges these create (see Fig. 1).Some information can serve as either items in the his-tory or as context, i.e., view dimension onto the his-tory. The order of page opening is typically used forhistories in web browsers. We also want to allow theorder of page views to be available. Identity of pages


89

Data Type Detail

Time of access Item/Context �me or order of page openings/page views

Page iden�ty Item/Context IS pages are easily iden�fiable. RD pages can be iden�fed by the POI they

relate to or by the items on the list. Iden�ty of maps with loca�on markup

are more complex.

User model Context physical, virtual and interac�ng user.

Display mode Context virtual mode and actual mode.

Physical loca�on Item/Context coordinates of the user loca�on, iden�ty of place, coordinates of POI

Virtual loca�on Item/Context coordinates of POI visited in hyperspace, iden�ty of place

Service Item/Context iden�ty of collabora�ng services

Addi�onal Context user-defined contexts such as business/private travel or �me of the day.

Figure 1: History Elements and Contexts

need to be analysed carefully in its meaning for differ-ent services. One may also design means to collapsesequences of similar pages (e.g., referring to the sameplace). Different user models (virtual, physical andinteracting) as introduced in (Hinze et al. 2006)) mayallow for a more structured history. Modes in thehistory could distinguish between places that havebeen visited physically and virtually. The user his-tory may offer a view according to the users physi-cal or virtual location. The history should allow aservice-based view (similar to the layered history inthe whiteboard).

Ideally, the complexity of the navigation wepresent here should be dramatically reduced for theuser, and also the complete history of their interac-tion should be recoverable. At its simplest, a linearaccess should enable retrieval of information the or-der the user experienced before. Linear structures areinsufficient for internal storage of the history; a com-plex history tree needs to be kept. However, giventhe screen size, a context-driven linear view (select-ing only items pertinent to a given context) may pro-vide a sufficient alternative. Collapsing and expand-ing information based on context could provide rich-ness without overburdening the user.

History variations can be distinguished based oncontext dimensions. A simple temporal history ofpage visits does not include duplicate visits. Pageidentity may made explicit (repeated visits to thesame page) or treated as separate visits. Differentservices may be distinguished: one can easily see howlocal histories for each service may be offered. Adistinction by time and physical user location meansthat all interactions at the same location are aggre-gated. This mapping needs either a distance thresh-old or semantic information about POIs. Partial timeper location then allows for ordering of page views ineach location.

We propose to see the history as a multi-dimensional hyper-space cube of access data that canbe accessed using aggregation methods known fromdata warehouses (drill-down and roll-up) and viewsfrom databases. Drill-down opens new dimensions,roll-up collapses dimensions. Slice and dice are theequivalent operations to dynamically change the com-binations of dimensions that are being viewed. Thedimensions and views are determined by the context.

Prototypes We implemented two functional pro-totype to evaluate our interface and user interactiondesign for the tourist information system TIP. In thefirst one, we used a history with temporal order wherethe services are indicated by the symbols used (twocontext dimensions), details can be found in (Thu-nack 2008). A second prototype explored the aggre-gation of history items (e.g., by location) Other op-tions are aggregations along the context dimensionsas discussed before, for example, aggregation of his-tory items in a temporal grouping or by service. Weevaluated the ability of users to retrace their trip us-

Figure 2: Aggregation on Map (l) and on history (r)

ing the different history designs. Participants wereasked to find the point of interest they visited aftervisiting another point of interest. The overwhelmingmajority of the participants found it easier to retracetheir steps using the new history system. This wasbecause they only needed to follow the heading linksto see the locations that were actually visited. Inthe history list the participants were forced to lookthrough the interactions and remember which onesconstituted an actual visit to the point of interest,as opposed to simply viewing information about theplace via a recommendation. Detailed information isavailable in (Campbell 2010).

Conclusion This paper extends the existing re-search on the navigation of a user’s history of in-teraction in hypertext. We are concerned with soft-ware that is built on heterogeneous service collabora-tion (as opposed to, e.g. the homogenous operationof a web browser), and includes mobility, location-dependency and the use of small screens, in additionto hypertext links. We wish to capture a full his-tory of a user’s interactions with the system, so thatpreviously seen information can be recalled in the or-der encountered by the user. We seek a design thatprovides a single universal history giving the user dif-ferent views of (projections onto) the data, with mini-mal effort. We propose to use techniques known fromdatabases and data warehouses to provide context-aware access to complex history data.Further researchis needed on how best to present the history informa-tion on a small screen.

References

Campbell, W. (2010), There and back again: Navigatinglocation-sensitive interaction history, Honours thesis, Uni-versity of Waikato, New Zealand.

Hinze, A., Malik, P. & Malik, R. (2006), Interaction de-sign for a mobile context-aware system using discrete eventmodelling, in ‘Australasian Computer Science Conference(ACSC 2006)’, Hobart, Australia, pp. 257–266.

Hinze, A., Mueller, K. & Buchanan, G. (2011), Service historyin mobile context-aware systems, Working paper, WaikatoUniversity.

Igarashi, T., Edwards, W. K. & LaMarca, A. (2000), An archi-tecture for pen-based interaction on electronic whiteboards,in ‘In Proceedings of the Working Conference on AdvancedVisual Interfaces’, ACM, pp. 68–75.

Kaasten, S. & Greenberg, S. (2001), Integrating back, historyand bookmarks in web browsers, in ‘CHI ’01 Human factorsin computing systems’, pp. 379–380.

Thunack, C. (2008), Konzeption einer benutzerschnittstellefuer ein kontextsensitives informationssystem, Master’s the-sis, FHTW Berlin, Germany.


90

An investigation of factors driving virtual communities

Jonathon Grantham, Cullen Habel Business School

University of Adelaide

Adelaide 5005, South Australia

[email protected]

[email protected]

Abstract

Adopting the view that virtual communities are an

important part of today’s social fabric, this study

investigates the relationships between different factors

commonly used to measure people’s attitudes towards

virtual communities. The study reviews existing research

in both traditional non virtual and virtual communities to

build a more inclusive factor model which includes the

idea of motivation and de-motivation found in Herzberg's

motivation-hygiene theory.

Keywords: virtual communities, drivers, factors,

motivation.

1 Introduction The last 20 years has seen the rapid growth of the global

computer network known as the internet. Early literature

surrounding the internet regarded this new technology as

revolutionary not only in its technical innovativity but

also in its broad social and political implications

(Benedikt 1991, Gore 1991, Negroponte 1995). Since the

first Bulletin Board Systems (BBS) in 1978 there has

been a dynamic and ever changing range of different

technologies allowing the formation of virtual

communities (Jones 2003). Currently virtual communities

exist through the use of chat servers, instant messaging,

list servers, social networking websites, dating websites

and even computer games. A virtual community may

mean many things to many different people; as a result it

is hard to find a definition that would be widely accepted

(Komito 1998). The word “community” alone has 94

different known definitions (Hillery 1955). For the

purpose of this paper the definition of a virtual

community will be an aggregation of people (whether

individuals, groups or businesses) who have common

interests and values and interact at least partially through

the use of the internet and other such technologies.

2 Literature

The versatility of virtual communities has created interest

from researchers from a large number of disciplines.

Researchers in marketing, education, social sciences,

computer science, medicine and information systems

have all built theories around why virtual communities

exist and how to use them productively.

2.1 Technology There is a tendency to categorise systems into

technologies of communication by using categories such

as; chat rooms, instant messaging, online retail, news

groups, message boards, social networking websites,

multiuser dungeons and dating websites (Utz 2000). This

method of categorising systems into technologies of

communication has suffered as many virtual communities

develop to use more than one technology. Technology

should be thought of as an enabler of human interaction

not a definer.

3 Proposed model

In 1959 Herzberg theorised there were both positive and

negative factors motivating employees. Herzberg argued

that the factors that created motivation were not the

opposite of the factors that created de-motivation. The

same theory can be applied to users’ attitudes to a virtual

community. The presence of a factor like “ease of use”

will not inherently create a positive attitude towards a

community or system, however a website which is

difficult to use will have a dramatic negative effect on the

user.

The model proposed in this paper is broken into two

groups of factors. The first factor group contains two sub

groups called ‘emotional connection’ and ‘perceived

benefits’. These are positive factors driving users to

create and participate in a specific virtual community.

The second side of the model contains the negative

factors which include ‘effort’ and ‘evidence’. These

factors will not contribute to creating a virtual community

by themselves, however, will weigh against a person’s

joining a virtual community if they are not present. It was

hypothesised in this study that members of a virtual

community would attach greater importance to positive

factors than to the absence of negative factors in their

decision to remain in a community.

3.1 Emotional Connection For the purpose of this paper emotional connection

factors are positive factors in that they create a positive

attitude towards a virtual community. The proposed

model breaks emotional connection into four driving

factors; ‘common values’, ‘common behaviours’,

‘common interests’ and ‘existing relationships’.

3.2 Perceived benefits Smith (1948) distinguishes between intrinsic and extrinsic

values. Intrinsic values are those which are internal and

Copyright © 2012, Australian Computer Society, Inc. This

paper appeared at the 13th Australasian User Interface

Conference (AUIC 2012), Melbourne, Australia, January-

February 2012. Conferences in Research and Practice in

Information Technology (CRPIT), Vol. 126. H. Shen and R.

Smith, Eds. Reproduction for academic, not-for profit purposes

permitted provided this text is included.


91

can’t be directly related to a monetary value. In this

model Intrinsic value will be represented by enjoyment.

An extrinsic value is a value which can be directly

related to a monetary value. In this model it shall simply

be referred to as ‘value’.

The model used in this study will break perceived

benefits into two driving factors; ‘perceived enjoyment’,

and ‘perceived value’.

3.3 Effort

Technology and costs factors have long been established

as factors affecting people’s attitudes towards virtual

communities. Hertzberg’s theory would suggest that they

are both negative factors and therefore just because a

virtual community is easy to use and free does not

necessarily mean it will be successful and used. However

if a virtual community is created which is extremely

expensive or is not easy to use we can be confident it will

not be used.

The model used in this study will break effort into two

driving factors; ‘ease of use’, and ‘cost’.

3.4 Evidence It takes a while for people to join a new virtual

community. There are multiple factors behind this which

include advertising, word of mouth, critical mass,

however, this can be explained by the normal new

product uptake predicted by the Bass model (Bass 1969).

The model used in this study will break evidence into

two driving factors; ‘absence of leadership’, and

‘perceived absence of population’.

4 Method A 9-point digital Likert scale questionnaire was used to

investigate the degree to which participants perceived

their virtual communities to exhibit each of the 10 target

characteristics. For instance, a statement related to

“Common values” might be, “I believe in what this

virtual community stands for”. Of the three virtual

communities (one dating, one social networking and one

online gaming website). A total of 132 questionnaires

were completed.

5 Results A multiple regression was applied to measure the impact

of the various factors on the dependent variable of

satisfaction with virtual communities.

The results of the multiple regression is shown in

Table 1. The only factor significantly connected to

people’s attitudes towards a virtual community was

common values’ (p < 0.05), although ‘ease of use’ and

‘cost’ attained marginal significance (p < 0.1 for both).

6 Discussion Contrary to our expectations, positive attitudes by virtual

community members towards their community were

correlated A possible limitation of the study is that it

involved relatively small samples from only three

communities, so that caution should be exercised in

extrapolating to the population of virtual communities as

a whole.

The data showed that there is a high correlation

between community common values, easy to use

interface and relatively cheap service and positive attitude

towards a virtual community. This could present issues in

any further studies focused around facebook communities

as the interface and cost will be constant across all

facebook groups.

Factor Alpha Beta Sig. Common Values 0.783 0.443 .000

Common Behaviours 0.37 0.054 .634 Common Interests 0.729 0.038 .759

Relationships 0.852 -0.105 .240 Enjoyment 0.842 -0.031 .816

Perceived Value 0.798 -0.024 .858 Population 0.609 0.095 .352

Ease of use 0.795 0.294 .007

Cost 0.812 0.291 .009

Leadership 0.668 -0.090 .327

Table 1: Significance of various factors in predicting

positive attitude towards a virtual community.

7 Conclusion

Analysing different virtual communities is likely to create

different results with different prominent factors. In the

case of dating websites the contribution from “common

values” may be an under-estimated factor and should be a

focus of website design.

8 References

Bass, F. (1969): A New Product Growth for Model

Consumer Durables. Management Science 15: 215-

227.

Benedikt, M. (1991): Introduction. In Cyber-space: First

Steps.

Gore, A. J. (1991): Information Superhighways: The next

information revolution.

Harris, Z. (1988): Science Sublanguages and the

Prospects for a Global Language of Science.

Herzberg, F., Mausner, B., & Snyderman, B. B. (1959):

The Motivation to Work.

Hillery, G. A. (1955): Definitions of community: Areas

of agreement. Rural Sociology. 20: p111-113.

Jones, S. (2003): Encyclopaedia of New Media: An

Essential Reference to Communication and

Technology, The Moschovitis Group.

Komito, L. (1998): The net as a foraging society: Flexible

communities. The Information Society. 14: 97-106.

Smith, J. W. (1948): Intrinsic and Extrinsic Good.

Utz, S. (2000): Social Information Processing in MUDs:

The Development of Friendships in Virtual Worlds.


92

Feasibility of Computational Estimation of Task-Oriented VisualAttention

Yorie Nakahira1 Minoru Nakayama1,2

1 Tokyo Institute of Technology,Email: [email protected]

2 Tokyo Institute of Technology,Email: [email protected]

Abstract

Eye movements are elicited by a viewer’s task-oriented contextual situation (top-down) and the vi-sual environment (bottom-up). The former and itscomputability were investigated by measuring a sub-jects ’ eye movements while identical graphs wereviewed under different mental frameworks. The pre-requisite conditions for predicting task-oriented at-tention from the visual environment were determined.

Keywords: eye movement prediction; top-down atten-tion; saliency; fixation; gaze duration

1 Introduction

Usability assessment of what we view by means of vi-sual comprehensiveness requires the repeated labor-intensive measurement and interpretation of data(Robert et al. 2003). Alternatively, a computationalmodel can be used to generate attention informationfrom the visual environment. However, eye movementpredictions need to integrate bottom-up image-basedsaliency prompts and top-down task oriented contex-tual situations (Itti & Koch 2001). Current researchhas difficulty with the modeling of the latter (Ben-jamin, 2009) when quantitative models such as Itti’ssaliency map (1998) are used.

To get around this problem, this study intends todetermine if it is possible to predict top-down atten-tion using only the visual environment, by exploringthe following: (1) which characteristics of images, orthe meanings the images possess, and (2) what con-textual situation, if given to viewers, enable the mod-eling of task-oriented attention using features of im-ages.

2 Method

2.1 Eye Movement Recordings

Seven subjects (4 male, 3 female) viewed eight bar-graphs (Figure 1) shown in two different constructs.In the first trial, subjects were told they would beshown rectangular objects, without being consciousthat they were bar-graphs. In fact, eight bar-graphswithout any axes were shown. In the second trial,subjects were assigned the task of understanding thequantitative data represented, using graphs. The

Copyright c©2012, Australian Computer Society, Inc. This pa-per appeared at the 13th Australasian User Interface Confer-ence (AUIC 2012), Melbourne, Australia, January-February2012. Conferences in Research and Practice in InformationTechnology (CRPIT), Vol.126, H. Shen and R. Smith, Eds.Reproduction for academic, not-for-profit purposes permittedprovided this text is included.

same eight bar-graphs were shown but with axes. Cal-ibration preceded both trials. Each image appearedrandomly, one at a time for three seconds, followed bya one second interval during which subjects fixatedon a central cross. Each subject repeated this exper-iments for 3 times. An eye tracker (nac: EMR-(NL,640x480 pixel resolution, 60Hz) recorded subjects lefteye positions.

2.2 Definitions of Metrics and Fixation Maps

In this paper, fixation is defined as a stable eye po-sition with a velocity below the threshold of 20 de-grees per second (Robert et al. 2003), gaze durationis defined as the sum of each fixation duration at aparticular region of interest, and scan paths are thespatial arrangements of a sequence of fixations. Scanpaths, the total number of fixations on each bar, andgaze duration on each bar were generated for lateranalysis.

The fixation location and duration were used togenerate two kinds of fixation maps (David 2002),which were three dimensional matrices: the x-y axeswere the size of the image, with the third dimensionbeing an indication of the number of fixations or dura-tion of gaze on each area. The Fixation Number Maps(Figure 3) were calculated by dropping an identicalGaussian kernel of the same height at the location foreach fixation. The Gaze Duration Maps were calcu-lated using a Gaussian kernel of a height proportionalto the fixation duration. The half-height width for theGaussian kernel is determined according to the areaover which a fixation can be said to exist. The map isthen normalized so that the final value of the highestpeak equals one. The initial fixation locations, whichare specified by the central cross at the moment whenthe images appear, are ignored during the calculationof this map.

3 Results

3.1 Modeling Attention Level

As previous research has suggested, Itti’s saliencymodel was not adaptable to both trials, because itfails to predict either the point on the graph which thesubjects’ eye moved to initially, nor the scan paths.Nevertheless, one notable thing is that when imagesare viewed sufficiently often (21 times per image), theaverage amount of attention paid to each region ofinterest as a whole (in this case, each bar) does havea correlational relationship with one of the saliencyfeatures: intensity. This implies that the saliency ofintensity may be used to indicate the level of atten-tion given to regions of interest, and to compute theprobable amount of visual attention.


93

The intensity map (Figure 1) of Itti’s saliencymodel may be used as measure of conspicuousness,to predict the distribution of fixations (define this as’Attention Level’). For each trial, in every bar in ev-ery image, the sum of the intensity corresponding tothe position of each of each bar was calculated, thennormalized so that the sum of the intensity of all barsin one graph equaled one. Similarly, on the fixationmaps, the sum of the values of all pixels viewed whichcorresponded to the positions of each bar (AttentionLevel) was calculated for both the Fixation Numberand Gaze Duration Maps, as a measure of the amountof attention given to each bar, followed by a normal-ization process so that the sum of the Attention Levelfor each bar on one bar-graph equaled one.

Table 1 shows the correlational coefficient betweenthe Attention Level in terms of fixation number orgaze duration and intensity for both trials. Figure 2illustrates the relationship between the proportionalfixation number and the relative intensity; each dotrepresents a particular bar in a bar-graph with atten-tion level on the horizontal axis and intensity levelon the vertical axis. Both the fixation number andintensity are normalized so that the sum of all barsin the bar-graph equals one.

Table 1: Correlation between Attention Level and In-tensity

trial1 trial2Fixation Number 0.50 0.65Gaze Duration 0.49 0.69

3.2 Adaptability of the Prediction Model

Contrary to the expectation that saliency features areinsufficient when predicting top-down attention fromthe second trial, Table 1 indicates that the correla-tional relationship for the first trial is no better thanfor the second trial. To illustrate this tendency, Fig-ure 3 plots the Gaze Duration Map for the image inFigure 1 of the first trial (right) and the second trial(left). The most significant difference in eye move-ment between the two trials is that subjects in thefirst trial focus more on a couple of positions corre-sponding to the largest points of intensity on the map.However, subjects in the second trial look at each barthoroughly, according to the relative level of intensity,which contributes to a better adaptability of the barswhich have a relatively low level of intensity.

Since the concept of a saliency map is convention-ally used for the first fixation prediction, it works wellwhen the intensity is high, but not surprisingly, losesits accuracy when the intensity is low. Yet, the inten-sity feature is a better predictor in situations wheresubjects intentionally pay attention to every bar, inorder to grasp the meaning of each graph as a whole,or in other words, to understand the quantity of eachbar in relation to the others. This is noteworthy, as itimplies that the difference between the two trials—theadditional task imposed in the second trial—actuallyhelps, rather than working against the accuracy of themodel.

In conclusion, it can be inferred that task-orientedAttention Level (not the first fixation) can be pre-dicted by measuring the intensity features from thevisual environment when the following two conditionsare satisfied: (1) there is an underlying top-downforce that directs gaze toward each region of inter-est, according to the level of perceived importanceimplicit in its saliency of intensity; and (2) the visualenvironment (the graphs used in this study) possessesan implicit meaning which can be calculated from theintensity of the image.

4 Summary

The study confirmed that top-down attention is ca-pable of being predicted merely by processing the im-ages in the visual environment. A future direction ofstudy would be to investigate the differences in pred-ication accuracy between each image, to determinewhich may or may not permit accurate predictions tobe calculated.

Figure 1: Example of a displayed Image and its In-tensity Map

0 0.2 0.40

0.1

0.2

0.3

0.4

Attention LevelIntensity conspicuousness

second trial (conscious)first trial (unconscious)

Figure 2: Attention Levels vs conspicuousness

x(pixel)

y(pixel)

200 400 600

100

200

300

400

200 400 600

100

200

300

400

0 0.5 10 0.5 1

Figure 3: Fixation Maps of Figure 1. The map forthe first trial is on right, second trial on left.

References

Robert, J. K., Jacob & Keith, S. Karn(2003), Eyetracking in human-computer interaction and us-ability research: ready to deliver to promises , inThe Mind’s Eye, pp574-605.

Itti, L., & Koch, C. (2001), Computational modellingof visual attention, in Nat. Rev. Neurosci. 5, pp194-203.

Benjamin, W. Tatler (2009), Current understandingof eye guidance, Visual Cognition, 2009, 17(6/7),pp777-789

Itti, L., Koch, C. & Niebur, E. (1998), A model ofsaliency-based visual attention for rapid scene anal-ysis, IEEE, 1998, pp1254-1259

David, S., W. (2002), Fixation maps: quantifying eye-movement traces, ETRA, 2002, pp31-36


94

Magnetic Substrate for use with Tangible Spatial AugmentedReality in Rapid Prototyping Workflows

Tim M. Simon1 Ross T. Smith2

School of Computer Information ScienceUniversity of South Australia,

Adelaide, South Australia,1 Email: [email protected] 2 Email: [email protected]

Abstract

We present a method for dynamic manipulation andinterchanging of tangible objects in a spatial aug-mented reality environment for use in rapid proto-typing. We call this method MagManipulation. Thismethod improves on existing methods in several ways:allows for the use of abstract and non-uniform curves,allows for ease of manipulation on non-tabletop likesurfaces, allows for interchangeable tangible objectsto be used. Our method allows us to dynamicallymanipulate tangible objects in a TSAR environmentin a manner unattainable with current technologies.

1 Introduction

This paper presents a method for the integrationof non-planar, non-uniform substrates with tangi-ble objects in a Tangible Spatial Augmented Reality(TSAR) rapid prototyping workflow using magneticfields. Current methods of integrating tangible ob-jects in a TSAR rapid prototyping workflow eitherrely on planar, horizontal surfaces or involve unwieldyor otherwise constraining systems to attach the tan-gible objects together. By utilising magnetic fieldsemanating from the tangible object, combined witha ferrous-surface substrate, non-planar non-uniformtangible objects can be easily and dynamically at-tached and manipulated in the TSAR rapid prototyp-ing workflow, providing numerous benefits over cur-rent TSAR workflows and technologies.

2 Background

Ever since Sutherland proposed the Ultimate Display[6], researchers have been striving to integrate Vir-tual Reality (VR) worlds with reality, through theuse of visual, tactile and digital interfaces. In in-dustrial rapid prototyping, Virtual Reality has beenused in rapid prototyping for some years [4]. Virtualinteraction within the prototyping workflow providesunique capabilities not present in traditional, physicalworkflows. Spatial Augmented Reality provides addi-tional enhancements [1] to the prototyping workflownot currently present in AR/TAR rapid prototypingsystems [3].

Copyright c©2012, Australian Computer Society, Inc. This pa-per appeared at the 13th Australasian User Interface Confer-ence (AUIC 2012), Melbourne, Australia, January-February2012. Conferences in Research and Practice in InformationTechnology (CRPIT), Vol. 126, Haifeng Shen and Ross Smith,Ed. Reproduction for academic, not-for-profit purposes per-mitted provided this text is included.

Pugh [5] describes a total design made of six steps,from identification of user needs through to manufac-ture and sales. Tangible Spatial Augmented Realityprovides particular enhancements to both the Con-ceptual Design and the Physical Design steps - thatis, through the conception and creation of numer-ous, wide-ranging ideas and mock-up designs. How-ever, traditional TSAR systems utilise flat, table-topdesigns[2]. These provide some benifits for applica-tions such as city planning - mock buildings and treescan be placed on the table easily and initiatively.However, it is difficult to attach tangible objects to-gether, and even placing tangible objects on top ofnon-planar objects such as tabletop displays is diffi-cult. Thus, the table-top design limits the creativityand natural design process by constraining the mock-up to a flat, planar surface. Whilst this is suitablefor some applications, such as city planning and thelike, it does not lend itself well to many other appli-cations. For example, a mock-up of a car dashboardloses much of its usefulness if the mock-up is not re-alistically shaped with curved surfaces - but when re-alistically shaped, attaching tangible objects to thedashboard poses a potential problem.

3 Non-planar non-uniform TSAR substrates

The use of physical mockups and prototypes duringthe industrial design process is an important feature,allowing the relationships between components to beexplored in a much more intuitive manner then theuse of CAD models and drawings alone[1]. The use ofSpatial Augmented Reality and Tangible User Inter-faces has been shown to enhance this process [Porter,

Figure 1: Curved substrate with tracked tangible ob-jects in TSAR environment. (1) Tracked Object, (2)Curved foam substrate, (3) Fridge Magnet.


95

Porter, Verlinden]. Thomas et. al [7] further ex-tended the use of Tangible Spatial Augmented Real-ity in the industrial design process through the utili-sation of tangible objects acting as intractable partsof the prototype. However, integrating these tangi-ble objects with non-tabletop-like mockups, such ascar dashboards containing both non-planar and non-uniform surfaces, presents several challenges. Thetangible object must be attached to the substrate ina manner which allows the detachment, as well ashigh-fidelity placement. For example, while a snap-grid system, such as found on LEGO blocks, wouldallow the attachment of objects to the substrate, theobject placement is limited to the fidelity allowed bythe snap grid. Furthermore, the integration of such asnap-grid onto the non-planar, non-uniform substratepresents additional challenges, encumbering the de-sign process. Overcoming these challenges with amethod that does not encumber the design process,and allows quick, uninhibited attachment of physicalartefacts to a substrate enhances the design processand allows such non-planar, non-uniform substratemodels to be used in a manner that is, using tradi-tional tools, quite difficult.

4 TSAR Design Guidelines

Working closely with industrial designers and archi-tects, we have been aiming to extend the existingPughs Total Design methodology by providing newfeatures into the industrial designers toolkit [7]. Thepurpose of the features are to allow designers to exper-iment with a vide variety of potential concept proto-types without the overhead of building the prototypeswith complex electronics to provide the functionality.For example, a climate control system on a dashboardrequired buttons and dials to allow the temperatureto be set by the user. Our approach has focused onallowing prototypes to be quickly built using simpleconstruction materials like foam board, timber, glueand nails and to enhance the simple surfaces witha complex computer graphics finish. With this ap-proach the system can still maintain simulated func-tionality using the TSAR system, but does not requirea significant commitment to explore a concept. Forthese reasons, we have identified the following require-ments to be used as a guideline for the developmentof future features for the methods used during devel-opment:

1. The underlying substrate should not include anycustom electronic components.

2. The substrate should be painted white to allowfor a vibrant surface appearance using the pro-jected SAR system.

3. Light weight4. Low cost

Following these guidelines, we have extended cur-rent TSAR environments to allow the dynamic intu-itive manipulation of tangible objects, providing func-tionality above and beyond existing systems.

5 Concept

We present a system utilising a ferrous coating com-bined with small magnets integrated into the tangibleobjects to allow high-fidelity coupling and movementbetween the object and the substrate. The physicalmockup, such as a car dashboard, is created and isthen coated in a paint containing ferrous material.Following Thomas et. als’ [7] use of 3-dimensionally

printed objects, cheap tangible objects of any shapecan be created and embedded with a small magnet,allowing attachment of the object to the substrate orto another tangible object.

6 Grouping of tangible objects

In addition to intuitive, dynamic manipulation andplacement of tangible objects, groups of tangible ob-jects can be created using a substrate covered with aferrous material with an attached magnet. Tangibleobjects can be placed onto the substrate in the de-sired relative positions, and then the substrate can bemanipulated, thus moving groups of tangible objectswhilst keeping their relative positions. For example,when prototyping a control system for a car dash-board, the air conditioner controls may be configuredinto desired relative positions and then the group asa whole shifted relative to the car dashboard. Thisallows for more intuitive, quick prototyping as com-pared to current technologies which are cumbersomeor manipulate tangible objects as individuals.

7 Conclusion

In this paper, we present a novel concept for improv-ing on existing TSAR rapid prototyping workflowsto include non-planar, non-uniform substrates. Thisconcept improves upon existing technologies to en-hance the industrial design workflow, allowing themanipulation of both single and groups of tangibleobjects in a TSAR rapid prototyping environment.

References

[1] O. Bimber and R. Raskar. Spatial Augmented Re-ality: Merging Real and Virtual Worlds. A K Pe-ters Wellesley, 2005.

[2] H. Ishii and B. Ullmer. Tangible bits: to-wards seamless interfaces between people, bits andatoms. In Proceedings of the SIGCHI conferenceon Human factors in computing systems, num-ber 8 in CHI ’97, pages 234–241, New York, NY,USA, 1997. ACM.

[3] M. Marner and B. Thomas. Augmented foamsculpting for capturing 3d models. In 3D User In-terfaces (3DUI), 2010 IEEE Symposium on, pages63–70, 2010.

[4] T.-J. Nam. Sketch-based rapid prototyping plat-form for hardware-software integrated interactiveproducts. In CHI ’05 extended abstracts on Hu-man factors in computing systems, CHI EA ’05,pages 1689–1692, 2005.

[5] S. Pugh. Total Design: integrated methods forsuccessful product engineering. Addison-Wesley,1991.

[6] I. E. Sutherland. The ultimate display. In Pro-ceedings of IFIP, volume 2, pages 506–508, 1965.

[7] B. Thomas, M. Smith, T. Simon, J. Park, J. Park,S. V. Itzstein, and R. Smith. Glove-based sen-sor support for dynamic tangible buttons in spa-tial augmented reality design environments. InWearable Computers, 2011. Proceedings. Four-teenth IEEE International Symposium on, 2011.


96

Data Mining Office Behavioural Information from Simple Sensors

Samuel J. O’Malley, Ross T. Smith and Bruce H. Thomas School of Computer and Information Science

University of South Australia

Mawson Lakes Boulevard, Mawson Lakes, South Australia, 5095

[email protected] [email protected] [email protected]

Abstract

This paper discussed the concept of using three simple sensors

to monitor the behavioural patterns of an office occupant. The

goal of this study is to capture behaviour information of the

occupant without the use of invasive sensors such as cameras

that do not maintain a level of privacy when installed. Our

initial analysis has shown that data mining can be applied to

capture re-occurring behaviours and provide real-time presence

information to others that occupy the same building..

Keywords: Digital Foam, Data Mining, Apriori Algorithm,

Non-invasive, Ambient Display, Market Basket Analysis

1 Introduction

This paper explores the concept of using simple sensors

incorporated into an office environment to capture behavioural

information that provides presence information to co-workers

through an ambient display. Two sensors are used to capture the

state information of doors when they are opened and closed, in

conjunction with a pressure sensitive cushion on the desk chair

which we refer to as a „seat sensor‟. We present how these three

simple sensors can be set up to log information to determine

physical presence and behavioural habits by data mining the

sensor information. By itself this data means nothing, but by

data mining the information over a long period of time we can

start to see reoccurring patterns in the occupant‟s behaviour.

Sensors capture data from the physical world and convert the

information to analogue or digital data. Electronic sensors can

capture temperature, humidity, light, and sound amongst others.

We consider the complexity of a sensor in terms of the data

complexity. For example, a 1.3 megapixel webcam at 24 FPS

generates approximately 20 gigabytes of information per hour.

Although complex sensors are very powerful for capturing data,

this paper shows how useful information can also be captured

from simple sensors that can be used to capture everyday

behavioural patterns. By capturing the state of these three

sensors we can start to build a behavioural presence model

using data mining.

2 Background

Sensors come in many different forms and new types are still

being discovered. In this paper we are interested in capturing

two events; firstly the state of a door to identify when it is open

or closed and secondly to identify when an office seat is

occupied.

Data mining provides a mechanism of discovering useful

information by analysing captured data to provide a summary of

the patterns that occur. The Apriori algorithm is commonly used

on massive supermarket transaction databases to discover

association rules, such as what products are commonly

purchased together. This is called market basket analysis (Chen

et al., 2005).

Copyright © 2012, Australian Computer Society, Inc. This paper

appeared at the 13th Australasian User Interface Conference (AUIC 2012), Melbourne, Australia, January-February 2012. Conferences in

Research and Practice in Information Technology (CRPIT), Vol. 126.

H. Shen and R. Smith, Eds. Reproduction for academic, not-for profit purposes permitted provided this text is included.

3 Simple Sensors Behaviour Concept

While a camera in the office would provide us with all the

information we needed to monitor an occupants presence, it can

be very invasive. By utilising simple sensors such as magnetic

door sensors and a seat sensor, the only information recorded is

that which could be obtained by walking past the office and

looking in. By employing the state information of these three

sensors, we can discover behavioural information such as the

most likely time that the occupant is available as a pattern

observed over a period of time.

From analysing the layout of the office in Figure 2, we noted

that there are three possible locations for simple sensors that

provide the most meaningful information. The office we are

interested in shares a reception area with another office. As

either door can be locked to outsiders, we chose to place

magnetic door switches on both the inner and outer doors.

Finally we decided to place a sensor on the desk chair to

determine if the occupant is seated or not. These three states can

give us an insight as to the occupant‟s availability.

3.1 Data Mining Sensor Information

The goal of data mining the sensor information is to discover

patterns in the office occupant‟s behaviour. This will give us an

insight into the most likely times that the occupant is available.

We used the Apriori algorithm (Agrawal and Srikant, 1994) to

generate a list of association rules. In order to apply the Apriori

algorithm we logged the data as transactions, where a new item

is added to the log when a state changes. An example of a useful

association rule is “!inner !seated 97%” which translates to:

97 percent of the time, if the inner door is shut then the

occupant is not seated.

3.2 Ambient Display of Information

A visualisation we employed to present the status information to

others in the same building but in a different room follows a

traffic light metaphor (see Figure 1). The purpose of the Traffic

Light visualisation is to inform the current availability of the

occupant. This visualisation utilises the behavioural knowledge

gained from preliminary data logging, based on four weeks of

data, and attempts to determine whether the occupant is

available or not using the state changes. The traffic light uses

ambient lights to display and communicate information without

distracting the user from their primary task. We employ the

philosophy of Ishii et al. used in their ambientRoom system.

The ambientRoom uses simple lights to convey information

such as the number of hits on a website (Ishii and Ullmer,

1997).

4 Implementation

We employ magnetic door switches and a foam based pressure

„seat‟ sensor for the implementation of our system. These

sensors are monitored using a micro-controller that captures

when the state of the sensors changes. Magnetic door sensors

are used to acquire state information from two doors located in

Figure 1: Traffic Light Visualisation


97

the office, the state identified is either open or closed. These

sensors are very simple, the switch is located on the door frame

and a magnet is placed on the door. When the door is shut the

magnet pulls the switch into an open position. This information

is identified by a micro-controller and communicated to the

computer as a string of comma delimited integers over the serial

port.

We decided to create a cushion which features four sensors

made from a material called Digital Foam to detect when the

occupant was seated. Digital Foam is a conductive foam

material that changes resistance when it is compressed (Smith,

2009), this material has also been used in an input device for

free-form 3D modelling which records the depth and location of

finger presses on its surface (Smith et al., 2008). By reading the

resistance values with the microcontroller, we can detect if the

cushion is being compressed. Including four sensors enables us

to potentially differentiate between a person and inanimate

objects, such as a stack of books, placed on the chair.

There are two modules to the software architecture, Data

logging, and visualisation. The Data logging software reads the

sensor information from the serial port and converts it into a

meaningful format. This information, along with the time and

date, is stored in a file which can later be used for data mining.

This information is sent across the network using TCP/IP in a

standard client-server configuration. The visualisation software

then presents the data using a variable floor plan diagram on a

wall mounted plasma screen, and via the ambient Traffic Light

visualisation. This diagram features animated doors and seat

state changes.

Figure 2: Office Floor Plan Visualisation

5 Results The data used to generate the results was gathered over four

weeks of constant logging. Each log entry contains the time of

day which is: early morning (before 9am), morning (9am to

midday), afternoon (until 5pm) or evening (until midnight), as

well as the day of the week. In this paper we are focusing on

finding the associations in the data rather than graphing

behaviour over time because we have only recorded four weeks

of data. In the future when we have logged months or even

years of data then we will use graphs to visualise the occupant‟s

reoccurring behavioural patterns over time.

Table 1: Top Results from the Apriori Algorithm

Rule Association Rule Confidence

1 100%

2 94%

3 72%

4 66%

5 100%

6 97%

7 92%

8 97%

9 96%

10 97%

11 89%

12 6%

13 87%

14 69%

15 91%

16 95%

17 79%

Key: !inner = Inner Door Shut, inner = Inner Door Open

The association rules generated from the Apriori algorithm give

us some useful insights into the office occupant‟s behaviour,

Table 1 is a list of the most relevant association rules that we

have selected.

Rules 1 to 4 show us that even if the door is open before 9am, it

is very unlikely that the occupant is seated yet, and Afternoon is

the most likely time that the occupant is seated.

Rules 5 to 7 can be interpreted as showing that the Afternoon is

the most likely time for the occupant to be working with the

office door shut.

Rules 8 to 11 show 10 cases in four weeks of working with the

inner door shut, 11 cases with the outer door shut, and 5 cases

with both shut. This shows that it is a standard behaviour of the

occupant to work with the door shut; however it is not done all

the time. This means that if either door is shut but the occupant

is seated, we display a yellow light on the Traffic Light to

signify that the occupant is busy. Compared to a green light

which shows the occupant is available, and a red light to show

unavailability.

Rules 12 to 16 show the days on which it is unlikely for the

occupant to be seated, Monday is very likely that the occupant

is seated and could possibly mean that Monday is the day where

the occupant sits for the longest time without leaving the office.

If we count how much activity occurs on each day then Monday

and Wednesday are the most active days, with Thursday and

Friday being the least active. Together these two rules would

suggest that Monday is the busiest day in the office. However

the scope of this system is only the activity which occurs inside

the office, Thursday and Friday could be equally as busy days

with many meetings and commitments outside the office.

Rule 17 shows that 79% of Wednesday‟s activity was with the

Inner Door open. This could mean that this is a day where the

secretary and the other office occupant were not present, so that

when the occupant left his office he locked the outer door rather

than both doors.

6 Conclusion

What we hope to achieve with the analysis is to provide a

summary of the behavioural information over time. From these

preliminary results we can already see glimpses of what we

could potentially learn from a larger dataset. Given the system is

kept running, the data mining will allow new patterns to be

revealed and potentially re-occurring behaviours over many

years could be identified. Providing co-workers with this

information where appropriate will allow them to select the

most appropriate time to find the occupant. In the future we

could also include other data such as public holiday dates,

semester times and conference submission deadlines to allow us

to generate an even more detailed behavioural presence model.

7 References

AGRAWAL, R. & SRIKANT, R. 1994. Fast Algorithms for

Mining Association Rules in Large Databases. Proceedings of

the 20th International Conference on Very Large Data Bases.

Morgan Kaufmann Publishers Inc.

CHEN, Y.-L., TANG, K., SHEN, R.-J. & HU, Y.-H. 2005.

Market basket analysis in a multiple store environment.

Decision Support Systems, 40, 339-354.

ISHII, H. & ULLMER, B. 1997. Tangible bits: towards

seamless interfaces between people, bits and atoms.

Proceedings of the SIGCHI conference on Human factors in

computing systems. Atlanta, Georgia, United States: ACM.

SMITH, R. T. 2009. Digital foam: a 3D input device.

SMITH, R. T., THOMAS, B. H. & PIEKARSKI, W. 2008.

Digital foam interaction techniques for 3D modeling.

Proceedings of the 2008 ACM symposium on Virtual reality

software and technology. Bordeaux, France: ACM.


98

Author Index

Donovan, Michael, 49

George, Reece, 49Grantham, Elizabeth, 21Grantham, Jonathon, 21, 91Guan, Li, 69

Habel, Cullen, 91Hartmann, Gabriel, 39Hinze, Annika, 89

Li, Ivan K. Y., 59Lutteroth, Christof, 59

Muller, Knut, 89Marner, Michael R., 77Mass, Ewald T. A., 77Maynard, John, 49McAdam, Rohan, 11

Nakahira, Yorie, 93Nakayama, Minoru, 93

Nesbitt, Keith, 11, 49Nuchanan, George, 89

O’Malley, Samuel J., 97

Park, Joonsuk, 29Park, Jun, 29Peek, Edward M., 59Pilgrim, Chris J., 3Powers, David, 21

Shen, Haifeng, iiiSimon, Tim M., 29, 95Smith, Mark, 29Smith, Ross T., iii, 29, 77, 95, 97

Thomas, Bruce, 29Thomas, Bruce H., 77, 97

Von Itzstein, Stewart, 29

Wunsche, Burkhard C., 39, 59, 69


99

Recent Volumes in the CRPIT Series

ISSN 1445-1336

Listed below are some of the latest volumes published in the ACS Series Conferences in Research andPractice in Information Technology. The full text of most papers (in either PDF or Postscript format) isavailable at the series website http://crpit.com.

Volume 113 - Computer Science 2011Edited by Mark Reynolds, The University of Western Aus-tralia, Australia. January 2011. 978-1-920682-93-4.

Contains the proceedings of the Thirty-Fourth Australasian Computer ScienceConference (ACSC 2011), Perth, Australia, 1720 January 2011.

Volume 114 - Computing Education 2011Edited by John Hamer, University of Auckland, New Zealandand Michael de Raadt, University of Southern Queensland,Australia. January 2011. 978-1-920682-94-1.

Contains the proceedings of the Thirteenth Australasian Computing EducationConference (ACE 2011), Perth, Australia, 17-20 January 2011.

Volume 115 - Database Technologies 2011Edited by Heng Tao Shen, The University of Queensland,Australia and Yanchun Zhang, Victoria University, Australia.January 2011. 978-1-920682-95-8.

Contains the proceedings of the Twenty-Second Australasian Database Conference(ADC 2011), Perth, Australia, 17-20 January 2011.

Volume 116 - Information Security 2011Edited by Colin Boyd, Queensland University of Technology,Australia and Josef Pieprzyk, Macquarie University, Aus-tralia. January 2011. 978-1-920682-96-5.

Contains the proceedings of the Ninth Australasian Information SecurityConference (AISC 2011), Perth, Australia, 17-20 January 2011.

Volume 117 - User Interfaces 2011Edited by Christof Lutteroth, University of Auckland, NewZealand and Haifeng Shen, Flinders University, Australia.January 2011. 978-1-920682-97-2.

Contains the proceedings of the Twelfth Australasian User Interface Conference(AUIC2011), Perth, Australia, 17-20 January 2011.

Volume 118 - Parallel and Distributed Computing 2011Edited by Jinjun Chen, Swinburne University of Technology,Australia and Rajiv Ranjan, University of New South Wales,Australia. January 2011. 978-1-920682-98-9.

Contains the proceedings of the Ninth Australasian Symposium on Parallel andDistributed Computing (AusPDC 2011), Perth, Australia, 17-20 January 2011.

Volume 119 - Theory of Computing 2011Edited by Alex Potanin, Victoria University of Wellington,New Zealand and Taso Viglas, University of Sydney, Aus-tralia. January 2011. 978-1-920682-99-6.

Contains the proceedings of the Seventeenth Computing: The Australasian TheorySymposium (CATS 2011), Perth, Australia, 17-20 January 2011.

Volume 120 - Health Informatics and Knowledge Management 2011Edited by Kerryn Butler-Henderson, Curtin University, Aus-tralia and Tony Sahama, Qeensland University of Technol-ogy, Australia. January 2011. 978-1-921770-00-5.

Contains the proceedings of the Fifth Australasian Workshop on Health Informaticsand Knowledge Management (HIKM 2011), Perth, Australia, 17-20 January 2011.

Volume 121 - Data Mining and Analytics 2011Edited by Peter Vamplew, University of Ballarat, Australia,Andrew Stranieri, University of Ballarat, Australia, Kok–Leong Ong, Deakin University, Australia, Peter Christen,Australian National University, , Australia and Paul J.Kennedy, University of Technology, Sydney, Australia. De-cember 2011. 978-1-921770-02-9.

Contains the proceedings of the Ninth Australasian Data Mining Conference(AusDM’11), Ballarat, Australia, 1–2 December 2011.

Volume 122 - Computer Science 2012Edited by Mark Reynolds, The University of Western Aus-tralia, Australia and Bruce Thomas, University of South Aus-tralia. January 2012. 978-1-921770-03-6.

Contains the proceedings of the Thirty-Fifth Australasian Computer ScienceConference (ACSC 2012), Melbourne, Australia, 30 January – 3 February 2012.

Volume 123 - Computing Education 2012Edited by Michael de Raadt, Moodle Pty Ltd and AngelaCarbone, Monash University, Australia. January 2012. 978-1-921770-04-3.

Contains the proceedings of the Fourteenth Australasian Computing EducationConference (ACE 2012), Melbourne, Australia, 30 January – 3 February 2012.

Volume 124 - Database Technologies 2012Edited by Rui Zhang, The University of Melbourne, Australiaand Yanchun Zhang, Victoria University, Australia. January2012. 978-1-920682-95-8.

Contains the proceedings of the Twenty-Third Australasian Database Conference(ADC 2012), Melbourne, Australia, 30 January – 3 February 2012.

Volume 125 - Information Security 2012Edited by Josef Pieprzyk, Macquarie University, Australiaand Clark Thomborson, The University of Auckland, NewZealand. January 2012. 978-1-921770-06-7.

Contains the proceedings of the Tenth Australasian Information SecurityConference (AISC 2012), Melbourne, Australia, 30 January – 3 February 2012.

Volume 126 - User Interfaces 2012Edited by Haifeng Shen, Flinders University, Australia andRoss T. Smith, University of South Australia, Australia.January 2012. 978-1-921770-07-4.

Contains the proceedings of the Thirteenth Australasian User Interface Conference(AUIC2012), Melbourne, Australia, 30 January – 3 February 2012.

Volume 127 - Parallel and Distributed Computing 2012Edited by Jinjun Chen, University of Technology, Sydney,Australia and Rajiv Ranjan, CSIRO ICT Centre, Australia.January 2012. 978-1-921770-08-1.

Contains the proceedings of the Tenth Australasian Symposium on Parallel andDistributed Computing (AusPDC 2012), Melbourne, Australia, 30 January – 3February 2012.

Volume 128 - Theory of Computing 2012Edited by Julian Mestre, University of Sydney, Australia.January 2012. 978-1-921770-09-8.

Contains the proceedings of the Eighteenth Computing: The Australasian TheorySymposium (CATS 2012), Melbourne, Australia, 30 January – 3 February 2012.

Volume 129 - Health Informatics and Knowledge Management 2012Edited by Kerryn Butler-Henderson, Curtin University, Aus-tralia and Kathleen Gray, University of Melbourne, Aus-tralia. January 2012. 978-1-921770-10-4.

Contains the proceedings of the Fifth Australasian Workshop on Health Informaticsand Knowledge Management (HIKM 2012), Melbourne, Australia, 30 January – 3February 2012.

Volume 130 - Conceptual Modelling 2012Edited by Aditya Ghose, University of Wollongong, Australiaand Flavio Ferrarotti, Victoria University of Wellington, NewZealand. January 2012. 978-1-921770-11-1.

Contains the proceedings of the Eighth Asia-Pacific Conference on ConceptualModelling (APCCM 2012), Melbourne, Australia, 31 January – 3 February 2012.

Volume 131 - Advances in Ontologies 2010Edited by Thomas Meyer, UKZN/CSIR Meraka Centrefor Artificial Intelligence Research, South Africa, MehmetOrgun, Macquarie University, Australia and Kerry Taylor,CSIRO ICT Centre, Australia. December 2010. 978-1-921770-00-5.

Contains the proceedings of the Sixth Australasian Ontology Workshop 2010 (AOW2010), Adelaide, Australia, 7th December 2010.

Australian Computer Society · 2020. 5. 27. · Table of Contents Proceedings of the Thirteenth...

Documents

Transcript of Australian Computer Society · 2020. 5. 27. · Table of Contents Proceedings of the Thirteenth...