Designing Effective Gaze Mechanisms for Virtual...

Designing Effective Gaze Mechanisms for Virtual Agents

Sean Andrist, Tomislav Pejsa, Bilge Mutlu, Michael GleicherDepartment of Computer Sciences, University of Wisconsin–Madison

1210 West Dayton Street, Madison, WI 53706, USA{sandrist, tpejsa, bilge, gleicher}@cs.wisc.edu

ABSTRACTVirtual agents hold great promise in human-computer inter-action with their ability to afford embodied interaction usingnonverbal human communicative cues. Gaze cues are partic-ularly important to achieve significant high-level outcomessuch as improved learning and feelings of rapport. Our goal isto explore how agents might achieve such outcomes throughseemingly subtle changes in gaze behavior and what designvariables for gaze might lead to such positive outcomes. Draw-ing on research in human physiology, we developed a modelof gaze behavior to capture these key design variables. In auser study, we investigated how manipulations in these vari-ables might improve affiliation with the agent and learning.The results showed that an agent using affiliative gaze elicitedmore positive feelings of connection, while an agent usingreferential gaze improved participants’ learning. Our modeland findings offer guidelines for the design of effective gazebehaviors for virtual agents.

Author KeywordsNonverbal behavior; gaze; virtual agents; affiliation; learning

ACM Classification KeywordsH.1.2 Models and Principles: User/Machine Systems—Human factors; H.5.2 Information Interfaces and Presentation:User Interfaces—Evaluation/methodology, User-centereddesign

General TermsDesign, Experimentation, Human Factors

INTRODUCTIONVirtual agents hold tremendous potential for computer inter-faces with their unique ability to embody humanlike attributes[7]. These attributes form rich communication mechanismswith which people are intimately familiar and afford intuitiveinteractions. Variations in these attributes activate key socialand cognitive processes, in turn eliciting significant positiveoutcomes such as improved learning and rapport in key ap-plication domains such as education [33], collaboration [46],and therapy [49]. A deeper understanding of how humans use

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.CHI’12, May 5–10, 2012, Austin, Texas, USA.Copyright 2012 ACM 978-1-4503-1015-4/12/05...$10.00.

Figure 1. Erin, one of the humanlike virtual agents we used in our studywhich examined how virtual agents could use their gaze effectively in aneducational scenario. Here Erin is giving a lecture on geographical locationsof ancient China.

these attributes in social interactions might enable designersto create more effective virtual agents in these domains.

Gaze cues are an important subset of the embodied cues thatpeople employ in social interactions. Using gaze cues, peoplecan control the flow of a conversation, indicate interest in orappraisal of objects and people, improve listeners’ comprehen-sion, express complex emotions, and facilitate interpersonalprocesses [1, 3, 36]. In order to design virtual agents thatachieve such significant effects, we must first design mecha-nisms that capture key variables of gaze and generate appropri-ate gaze behaviors. However, creating effective gaze cues forvirtual characters is an open challenge, as humans have builtand finessed subtle but complex patterns in which they usethese cues over thousands of years of evolution and developeda sensitivity to observing them in others [42]. Furthermore,how these design variables might be varied to achieve specifichigh-level social and cognitive outcomes is not well under-stood. The goal of our research is to explore how agentsmight achieve these outcomes through subtle variations ingaze behaviors and what design variables might enable suchvariations. We contextualize the work presented in this paperin an educational scenario, in which virtual agents promiseimprovements in affiliation with students and in learning.

In this paper, we demonstrate that virtual agents can use subtlevariations in gaze to achieve significant high-level outcomes.We present a parametric model that synthesizes appropriategaze behaviors and allows for systematic variations in thesebehaviors, which we validated through an online user studywith 96 participants. Our design of the gaze model drawson research in human physiology, which precisely describes

how humans coordinate various gaze parameters such as themovements of the eyes and the head when executing gazeshifts—the intentional redirection of gaze toward a particularpiece of information in the context of interaction. Gaze shiftsare a fundamental building block of overall gaze behavior.Through subtle variation in timing and movement of the headand eyes, individuals construct a range of high-level behaviors.Our model of gaze was designed and parameterized in sucha way as to provide sufficient control over the subtleties ofhead and eye movements, enabling a virtual agent to naturallydisplay these behaviors.

We also present a user study with 20 participants, in which weinvestigated how a virtual agent (Figure 1) might manipulatelow-level gaze parameters to achieve high-level effects suchas improved learning and feelings of rapport in a storytellingcontext. We manipulated the parameters of our gaze modelto create affiliative gaze—maintaining a head orientation to-ward the participant to emphasize the social interaction [12]—and referential gaze—maintaining a head orientation towardshared visual space to emphasize the information to whichthe speaker is referring [30]. In both cases, the agent looksat the same targets; the difference lies in the timing and thedegree to which the agent uses its head or eyes to look at thesetargets. The results showed that these manipulations generatedsignificant social and cognitive effects; the use of affiliativegaze improved the perceptions of the agent and the use ofreferential gaze increased recall performance.

The remainder of the paper provides related work on gazefrom a number of perspectives, describes our model and avalidation study, outlines the design and results of our userstudy, and discusses our findings and their implications for thedesign of effective gaze mechanisms for virtual agents.

BACKGROUNDGaze is an important social cue that has been studied undera number of different academic disciplines. First we presentwork in psychology on how humans use their gaze in social in-teractions, as well as some of the achievable high-level effectsof gaze. Next we discuss related work on embodied conversa-tional agents in the computer graphics and HCI literatures.

Gaze in Human CommunicationIn general, being gazed at by another person can result in anumber of interesting effects. Gaze is predominantly inter-preted as being attended to, and depending on the context ofinteraction, being gazed at can lead to discomfort from feelingobserved, or lead to genuine social interaction [1]. A personwho makes increased eye contact is associated with greaterperceived dynamism, likeability, and believability [4]. Thesepeople are also seen as more truthful or credible [1]. One illus-trative study specifically showed that people who spend moretime gazing at an interviewer receive higher socioemotionalevaluations [16]. On the other hand, gaze aversion producesconsistently negative effects in impressions of attraction, cred-ibility, and relational communication [5].

An important construct in the study of nonverbal behavioris immediacy, defined as the degree of perceived physical orpsychological closeness between people [37]. Gaze has been

found to be a significant component of immediacy, especiallyin the context of improving educational outcomes [21]. Stu-dents from primary school age through college have beenshown to learn better when they are gazed at by the lecturer[41, 47]. Learning in these cases was usually measured bythe students’ performance on recall tasks. In a similar study,gazing into the camera during a video link conversation wasshown to increase the recall of the viewer at the other end [14].Gaze’s positive effect on recall is usually attributed to its roleas an arousal stimulus, which increases attentional focus andtherefore enhances memory [26].

The above work shows that gaze as a whole can create positiveinteractions and learning effects. Our goal is to investigate howsubtle parameters of gaze might serve to strengthen or weakenthese effects. One of these parameters is the alignment ofthe head, which plays a substantial role in shifting a viewer’svisual attention [23]. Gazing at someone with the head fullyaligned (affiliative gaze) might be perceived differently thangazing at someone out of the corner of the eyes with thehead aligned towards information of interest in the context ofinteraction (referential gaze). A virtual agent might be able tomanipulate this, and other parameters, to achieve specificallydesired effects.

Embodied Conversational AgentsOur work builds on previous research in the area of embod-ied conversational agents. Developing embodied agents thatcan communicate effectively is a significant area of research.Giving these agents more complex and human-like behaviorshas been a longstanding goal, for example to make inhab-ited interfaces and agents in VR environments more effective[40]. The ability to use non-verbal communicative behavior isvery important as it increases positive feelings of copresenceand familiarity, and in general makes agents more effectivecommunicators [2].

An agent’s gaze behavior is very important to providing richinteractions; well-designed gaze mechanisms–e.g., gazing atturn-taking boundaries during conversation–can result in moreefficient task performance and more positive subjective evalu-ations [22]. However, poor gaze behavior can be worse thanno gaze at all. The positive effects of having an embodiedagent–as opposed to only audio or text–can be completelylost if the gaze is very poor or random [15]. Gaze behav-ior of a female virtual agent, when coupled with the agent’sappearance, can make the difference between enhancing nega-tive attitudes toward women or breaking gender stereotypes[10]. The high-level effects of gaze discussed above have beenshown to also extend to physical embodiments, such as robots.For example, the learning effect of gaze has been observed inhuman-robot interactions; increased gaze from a storytellingrobot facilitates greater recall of the story [39].

Previous work on modeling human gaze to inform the designof agents includes the area of turn management [8, 43], figur-ing out where agents should be looking and why they should belooking there [25, 31], or how an agent should make randomeye saccades in idle situations [6] or face-to-face conversa-tions [32]. Some attempts have been made to model head andeye motions for virtual agent gaze, including data-driven [9]

Target, Agent,& Environmental

ParametersHead LatencyParameter

OMR Head AlignmentPreferences

VelocityPro�les

Head MovementEye Movement

Key InputParameters

Key Stages ofGaze Shi�

EndLatency Period

Eyes ReachTarget + Start VOR

timeMove Head to TargetMove Head to TargetMove Head to Target if (–)

Move Eyes to Target if (+) Move Eyes to Target Eyes Engage in VOR

Gaze Shi�Complete

Gaze Shi�Start

B D E

F

C

Head Latency Negative

Head Latency Positive

A

Figure 2. A visual representation of our model of gaze shifts mapped onto time. Key input variables and processes include (A) target, agent, and environmentalparameters, (B) head latency, (C) velocity profiles for head and eye motion, (D) oculomotor range (OMR) specifications, (E) head alignment preferences, and (F)the vestibulo-ocular reflex (VOR)1.

and procedural [45] approaches, as well as a hybrid betweenthe two [29]. In general, existing research has proven thatmore realistic gaze behavior for humanoid avatars results inconsiderably improved and fluid communication [15]. At thevery least, simply conveying gaze direction eases turntakingand is essential to establishing who is talking and listening towhom in multiparty mediated communication [50].

While previous research has explored how gross manipulationsin gaze behavior might shape user experience and perceptionsof virtual characters, how specific parameters in the controlspace for gaze might be mapped to specific outcomes in in-teraction has not been explored. Our work seeks to fill thisknowledge gap. Our model of gaze, presented next, providesparameters that work as low-level gaze variables, and allowsus to investigate their impact on high-level outcomes in a prin-cipled way. Advantages of our model include its proceduralsimplicity, grounded nature, and proven ability to producelarge objective and subjective high-level effects through themanipulation of very subtle parameters.

DESIGNING EFFECTIVE GAZE MECHANISMSIn order to design virtual agents to be effective communicators,we must develop a deeper understanding of how humans usesubtle gaze behavior to achieve high-level outcomes. We turnto research in neurophysiology to develop this understanding.Neurophysiologists have studied how humans and other pri-mates coordinate head and eye movements during gaze shifts,in the process revealing potential parameters of gaze that mighteventually lead to the positive social outcomes we wish to cre-ate with virtual agents. Here we present the most relevantfindings from neurophysiology literature which we combineinto a model of gaze shifts useful to virtual agents. We alsopresent a validation of our model, taking the form of a userstudy, which shows that virtual agents using the model canexecute gaze shifts that are both communicative and natural. 1

Given the current configuration of the head and eyes and in-put parameters indicating movement characteristics and tar-get gaze direction, our model computes a trajectory for the

1A full specification of the model, including documented pseudocode,full details of the validation, and videos of animated characters usingthe model for their own gaze behavior, can be found at http://hci.cs.wisc.edu/projects/gaze.

head and eyes. It first computes a few key variables for themovement, and then computes velocities for the head and eyerotations for each frame of the animated movement, keepingthe gaze on target until the eyes and head have reached theirfinal target rotations. The model is presented graphically inFigure 2, and includes six main components: (A) target, agent,and environmental parameters, (B) head latency, (C) veloc-ity profiles for head and eye motion, (D) oculomotor range(OMR) specifications, (E) head alignment preferences, and (F)the vestibulo-ocular reflex (VOR).

The first variable to compute when executing a gaze shift ishead movement latency in relationship to the eye movement(Figure 2b). This head latency can vary from person to per-son and task to task. Factors such as target amplitude, thepredictability of the target [44], modality of the target (visualor auditory) [17, 18], target saliency, vigilance of the subject,and whether the gaze shift is forced or natural [51] have aneffect. These factors serve as the target, agent, and environ-mental input parameters to our model (Figure 2a). Our modelstochastically combines these parameters to determine thehead latency based on findings from Zangemeister et al. [51].A positive latency results in a period of eye motion duringwhich the head remains fixed, while a negative latency resultin a period of head motion while the eyes remain fixed.

Once the head latency period is complete, both the eyes andhead begin simultaneously moving towards the target. Bothduring and after the latency period, each eye and the headfollow velocity profiles (Figure 2c) that resemble standardease-in and ease-out functions [27, 32]. These movementsprevent unnatural head motions caused by high-frequencysignals [35]. The maximum velocity of the eyes and head—the peaks of the velocity profiles—are computed based onpositive linear relationships with the amplitude of the intendedgaze shift [20]. Once the maximum velocity for the gaze shiftis computed, the shift is executed by computing the actualvelocity for each frame of the animation. The velocity isdetermined by a piecewise polynomial function derived toapproximate the published experimental data. This polynomialfunction can be expressed as follows, where g is the proportionof the gaze shift completed, Vmax is the maximum velocity,and V is the current calculated velocity.

http://hci.cs.wisc.edu/projects/gaze

http://hci.cs.wisc.edu/projects/gaze

V ={

2Vmax · g g ∈ [0, 0.5)4Vmax · g2 − 8Vmax · g + 4Vmax g ∈ [0.5, 1]

An important component of the model is the oculomotor range(OMR) (Figure 2d), which limits rotation of the eyes such thatthey don’t roll back into the head. The human OMR has beenestimated to be between 45◦ and 55◦. A virtual character’sbaseline OMR can be determined based on the size of theeye cavities and the size of its pupils and irises. However,merely encoding these OMR values as static parameters isnot sufficient, as the effective OMR may fluctuate during thecourse of a single gaze shift. The fluctuation is a product ofthe neural (as opposed to mechanical) nature of the limitationimposed on eye motion. The virtual agent’s eye rotations arenever allowed to surpass the current effective OMR.

At the onset of a gaze shift, effective OMR is computed basedon the initial eye position and the baseline OMR. Initial eyeposition is measured in degrees as a rotational offset of the cur-rent eye orientation from a central (in-head) orientation. Thisvalue is only non-zero when the rotational offset is contralat-eral (on the opposite side of center) to the target. When theeyes begin the gaze shift at these angles, the effective OMRhas a value close to the original baseline value. When theeyes begin the gaze shift closer to a central orientation in thehead, the effective OMR diminishes [11]. Effective OMRis also updated throughout the gaze shift at every time stepaccording to the concurrent head velocity. As the head movesfaster, the effective OMR diminishes [20]. See the onlinesupplement for details on the effective OMR computations,including equations and empirically derived constants.

The next component of the model is a user-defined parameterspecifying the head alignment preferences of the agent (Fig-ure 2e). Head alignment—how much individuals use theirheads in performing a gaze shift—is highly idiosyncratic, andcreates a differential directness of gaze at the target [13]. Aparameter value of 0% for head alignment indicates that oncethe eyes have reached the gaze target, the head stops moving,resulting in gaze at the target out of the corner of the eyes. Onthe other hand, at a 100% parameter value for head alignment,the head continues rotating until it is completely aligned withthe target, with concomitant compensatory eye movement tokeep the eyes directed toward the target, resulting in gazewith the eyes and head both fully directed towards the target.Head alignment values between these two extremes can becomputed using spherical linear interpolation between the twocorresponding rotational values. This parameter can be manip-ulated to create affiliative gaze–keeping high head alignmentwith a conversational partner and low head alignment witheverything else–or referential gaze–keeping high head align-ment with information being referred to in the environmentand low head alignment with the conversational partner. Headalignment is the most directly controllable input parameter tothe model, and as such is the parameter we focus on in ourexperimental methodology.

Once the eyes have reached their target, the vestibulo-ocularreflex (VOR) keeps them locked to the target while the head

finishes its rotation (Figure 2f). This component of gaze,the final in our model, is handled by rotating the eyes in theopposite direction of head motion as the head completes itsportion of the gaze shift, keeping the eyes fixated on the gazetarget even as the head keeps rotating.

Ancillary components of our model include a blink controllerfor generating both random and gaze-evoked blinking as de-scribed by Peters [45], and idle gaze behavior. When the agentis not actively engaging in a gaze shift following our model,the eyes are controlled by an implementation of the modelpresented in Lee et al. [32]. This implementation of subtlerandom eye movements has been shown to dramatically in-crease the realism of the character. The virtual character’seyelids also move with vertical shifts of the eyes, rising up asthe eyes pitch upwards and dropping down as the eyes pitchdownward [48]. Lastly, we implemented ambient facial andbodily cues for our virtual characters to prevent the unnaturalrigidity inherent to a static character. When the character is inan idle state, these cues include smiles and slight movementsof the arms and the head.

Model ValidationOur model development was followed by an empirical evalua-tion of the communicative accuracy and perceived naturalnessof gaze shifts generated by our model. We compared gazeshifts generated by our model against those generated by abaseline model and those displayed by a human.

Participants – Ninety-six participants (50 males and 46 fe-males) took part in the validation study. The participants wererecruited through Amazon.com’s Mechanical Turk online mar-ketplace, following crowd-sourcing best practices to minimizethe risk of abuse and to achieve a wide range of demographicrepresentation [28, 24]. Participants received $2.50.

Study Design – In the study, participants were shown a seriesof videos in which either an animated virtual character or ahuman confederate shifted gaze toward one of sixteen objectsarranged on a desk. This simplified scenario allowed us tofocus our evaluation on the effectiveness and naturalness ofgaze shifts, while minimizing contextual and interactional

Male Human Confederate Female Human Confederate

Agent with Male Appearance Agent with Female Appearance

Figure 3. Still images from the videos presented to the participants.

factors and facilitating the matching of animated and realworld conditions. Participants observed the agent from theperspective of a collaborator seated across from the agentor the human confederate. The objects on the desk wereseparated into four groups, distinguished by color and shape.Still images from the videos are shown in Figure 3.

The study followed a two-by-four split-plot design. The factorswere gender–participant matched with agent–varying betweenparticipants, and model type, varying within participants. Themodel type independent variable included gaze shifts gener-ated by a baseline gaze model from Peters [45], and thoseproduced by our model. The model type independent vari-able also included two control conditions. In the first controlcondition, a virtual agent with a male or female appearancemaintained gaze toward the participant (i.e., the camera) with-out producing any gaze shifts at all. In the second controlcondition, a male or female human confederate presented gazeshifts toward the object on a desk in front of him/her. The or-der in which the conditions were presented to each participantwas counterbalanced.

Procedure – Each participant was shown 32 videos of a virtualcharacter or human. In the videos, the agents or the confeder-ates gazed toward the participant (i.e., the camera), announcedthat they are about to look toward an object of a specific coloron the table, shifted their gaze toward the object, and movedtheir gaze back toward the participant. Following each video,the participants answered a set of questions that measured thedependent variables. Participants evaluated each conditionfour times in a stratified order. Each video was approximately10 seconds, with the overall study lasting around 20 minutes.

Measures – The validation study used two dependent vari-ables: communicative accuracy and perceived naturalness.Communicative accuracy was measured by capturing whetherparticipants correctly identified the object toward which thegaze shift of the agent was directed. Perceived naturalnesswas measured with a four-item scale with high reliability, in-cluding naturalness, humanlikeness, lifelikeness, and realism,Cronbach’s α = .94.

Results – We conducted a mixed-model analysis of variance(ANOVA) on the data to determine the effect that different

Com

mun

icat

ive

Acc

urac

y

Agent Human OurModel

BaselineModel

p = ns

p = ns

10%

0%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Perc

eive

d N

atur

alne

ss

p = .068†

2

1

3

4

5

6

7

Real

ism

p = .017*

2

1

3

4

5

6

7

Control


BaselineModelControl


BaselineModelControl

Figure 4. Results from the communicative accuracy, perceived naturalness,and realism measures. Conditions are from left-to-right: agent control (nogaze shifts, only idle gaze), human control (video of a gazing human), agentsshifting their gaze using a previously published baseline gaze model [45], andagents shifting their gaze using our model of gaze.

gaze models had on how accurately participants identified theobject that the agents or the human confederates looked to-ward and the perceived naturalness of the gaze shifts. Overall,model type had a significant effect on communicative accuracy,F (7, 46.67) = 16.77, p < .001, and perceived naturalness,F (7, 46.67) = 151.03, p < .001.

Pairwise comparisons found no significant differences inthe communicative accuracy of the gaze shifts producedby our model and those produced by human confederates,F (1, 14.91) = 0.03, p = ns. Similarly, no differences in ac-curacy were found between our model and the baseline model,F (1, 1958) = 0.17, p = ns. The results suggest that the gazeshifts generated by our model are just as accurate as thoseperformed by human confederates and those generated by thebaseline model.

Comparisons across conditions showed that participants ratedgaze shifts generated by our model as marginally more naturalthan those generated by the baseline model, F (1, 1967) =3.34, p = .068. Comparisons over realism (one of the itemsincluded in the perceived naturalness scale) found that gazeshifts produced by our model were rated as significantlymore realistic than those generated by the baseline model,F (1, 1963) = 5.75, p = .017. Results on the communica-tive accuracy, perceived naturalness, and realism measures areillustrated in Figure 4.

EXPERIMENTAL EVALUATIONThe validated model of gaze shifts provides parameters thatallow us to explore how manipulation of these low-level param-eters can achieve valuable high-level effects. The main studyof this paper explores manipulating the head alignment param-eter in order to create more affiliative or more referential gazecues, with the goal of increasing feelings of connection withthe agent and increasing learning respectively. The experimentwas carried out in the context of an educational scenario, withthe virtual agent serving as a lecturer to the human participant.The virtual agent taught the participant about a specific subjectfrom ancient Chinese history. A map of China was used asa reference to facilitate the descriptions of geographical loca-tions. An example of the interface presented to participants isshown in Figure 1.

HypothesesWe designed an experiment to test three hypotheses. First, wewished to merely confirm that the presence of an agent willelicit better learning than only audio.

Hypothesis 1 – The presence of an embodied agent will resultin better recall performance than only hearing audio with noaccompanying agent.

The literature strongly supports the first hypothesis, in whichit is shown that teachers who gaze at their students lead thestudents to learn better [41, 47]. We seek to show these sameeffects can be achieved by virtual agents using our gaze model.

Hypothesis 2 – An agent which employs more affiliative gaze(maintains higher head alignment with the participant) willgarner higher subjective evaluations than one which uses more

Storytelling

MapBoth

Referential

A�liative

Human(Participant)

Physical SpaceVirtual Space

Virtual AgentHu

Both

Referentia

l Space

Figure 5. A diagram of the setup of the study showing the range of the agent’shead movements for each gaze condition. The agent’s eye motions (notdepicted) always move the full distance from eye contact with the participantto eye contact with each map location being referred to.

referential gaze (maintains higher head alignment with theinformation being referred to).

This hypothesis is supported by the fact that people are per-ceived as being more intelligent, more trustworthy, and morefriendly if they make more direct eye contact [1, 14]. We wishto extend this work to show that head alignment makes animpact on the positive subjective effects of mutual gaze, andthat our model is capable of achieving these effects.

Hypothesis 3 – Viewing an agent using more referential gazeshould result in better recall performance. This will especiallybe true when the information to be recalled relies on buildingassociations to objects in the environment.

Referential gaze helps build associations between informationand objects in the environment. In communication betweena human and virtual agent with a shared environment, beingable to perceive the agent’s eye gaze to different objects inthe environment reduces the time and verbal communicationneeded for grounding references [34]. We believe that headalignment has the ability to strengthen or weaken this effectsince it plays a substantial role in shifting a viewer’s visualattention [23]. When the agent’s head is aligned more fullywith the object under consideration, this should serve as astronger referential gaze cue than if the head is not aligned.We wish to show that our gaze model can achieve this effect.

ParticipantsWe recruited 20 participants for our study (10 males and10 females), with ages ranging from 19 to 65 (M = 27.8,SD = 14.3). Fifteen participants were students and five wereworking members of the community. All were native Englishspeakers. Student participants came from a number of differ-ent fields, including psychology, engineering, and business.All participants were recruited using a combination of campusflyers and on-line student job forums.

Study DesignWe employed a within-subjects design for this study. Ourexperiment involved one factor, type of gaze, with four levels,each of which is defined as follows:• Audio: The agent is shown briefly during its introduction. Then

the lights in the virtual scene are extinguished for the duration ofthe lecture, except for a spotlight on the map, so only the map

can be seen and the agent is in complete darkness. At the end ofthe lecture, the scene lights turn back on for the agent to give itsinstructions to the participant on taking the subjective evaluationand quiz for the lecture.

• Affiliative: The agent keeps its head aligned with the participantas much as possible during the lecture. When the agent is makingdirect eye contact with the participant, the head is fully alignedwith the participant. When the agent shifts its eye gaze to refer tosomething on the map, the head aligns as little as possible with themap – as much as the agent’s OMR will allow – so as to keep thehead aligned towards the participant.

• Referential: The agent keeps its head aligned with the map asmuch as possible during the lecture. When the agent is gazingat the map, the head is fully aligned with the map. When theagent shifts its eye gaze back to the participant, the head alignsas little as possible with the participant – as much as the agent’sOMR will allow – so as to keep the head aligned towards themap. The referential condition is “more referential” (borrowingthe term “referential” from linguistics) because the head maintainsalignment with the information being referred to on the map, bothwhen looking at the participant (out of the corner of the eyes) andwhen looking at the map (head and eyes fully aligned).

• Both: When gazing at the participant, the agent keeps its headfully aligned to the participant. When gazing at the map, the agentaligns its head fully with the map.

Figure 5 and Figure 6 illustrate the difference in head motionbetween the three conditions affiliative, referential, and both.It should be stressed that in all three of these conditions, theagent’s eyes move the full distance from eye contact with theparticipant to each map location being referred to. The agent’seyes always converged on each map location while it wasbeing referred to.

Each participant viewed four lectures given by four differentvirtual agents, each utilizing a different gaze condition. Thus,every participant was exposed to all four gaze conditions. Allagents, three of which can be seen in Figures 1 and 6, werespecifically designed to look and sound androgynous so asto eliminate gender biases as much as possible. Each agentalways gave the same lecture, but pairings between agent andgaze condition were stratified across participants for balance.

Map Participant

Referential/Both Map Referential Participant

A B

C D

Figure 6. A visual depiction of an agent in different gaze conditions: (A) Af-filiative, looking at map, (B) Affiliative (and Both), looking at participant, (C)Referential (and Both), looking at map, (D) Referential, looking at participant.

Figure 7. The physical setup of the experiment.

The order in which the lectures – and accordingly gaze con-ditions – were presented to the participants was completelyrandomized to offset any bias due to fatigue. Lectures werealso kept informationally distinct in an attempt to decreaselearning bias. Each lecture was carefully controlled to be nearthe same length–approximately three minutes.

During each lecture, the agent makes eleven gaze shifts tothe map, each followed shortly by a gaze shift back to theparticipant. The agent fixates its gaze on different targets onthe map as they are mentioned, e.g. while giving facts about aChinese city visible on the map. Timing of this gaze shift isdone according to Griffin [19] and Meyer et al. [38], whichindicate that a deictic gaze shift should occur 800 - 1000 msbefore the object being gazed at is mentioned in speech.

Implementation – The experiment was implemented usinga custom framework built on top of the Unity game engine(www.unity3d.com). The gaze model was implemented asa Unity script. The characters were created using a com-mercially available parametric base figure. Audio was pre-recorded and pitch-shifted to sound androgynous.

ProcedureThe experiment was conducted in a closed study room withno outside distraction. Participants entered the experimentroom and were asked to sit at a table with a single computermonitor and mouse. The monitor was a 32-inch flat paneldisplay, allowing the virtual agent representation (only headand shoulders) to be near life size. This setup can be seen inFigure 7. The experimenter gave the participant a brief de-scription of what they would be asked to do in the experiment,and then asked them to review and sign a consent form. Theexperimenter told the participants that they were going to belistening to and quizzed on four short lectures from differentvirtual lecturers, each on a topic pertaining to ancient China.

After consenting, the experimenter told participants that theycould press the start button on the screen after he left the room.Upon pressing the start button, the first lecturer (randomlychosen by the software) began introducing itself and giving itslecture. Upon completion of the lecture the screen went black,and participants filled out on paper a subjective evaluation ofthe lecturer they had just viewed, followed by a quiz on the ma-terial presented in the lecture. During the initial instructions,the experimenter made it clear that the quiz was to be filled

out after the subjective evaluation had been completed. Thisallows the subjective evaluation to double as distractor task,strengthening any subsequent recall measures. All participantswere monitored via closed-circuit camera by the experimenterto ensure that these instructions were followed, but no partici-pants were observed to neglect the instructions.

After filling out the subjective evaluation and quiz, the par-ticipant could go back to the monitor and click the on-screenbutton to begin the next lecture. This process was repeateduntil all four lectures had been viewed, rated, and quizzed.At this point, the experimenter re-entered the room with ashort questionnaire of demographic information. Followingcompletion of the questionnaire, the experimenter debriefedand payed the participant. The total experiment took approxi-mately 30 minutes, and participants were paid $5.

MeasuresOur experiment involved one independent manipulated vari-able, type of gaze, manipulated within participants. The depen-dent variables included objective measurements for evaluatingparticipants’ recall of the lecture material and subjective mea-surements for evaluating the participants’ impressions of thevirtual agent.

The objective measurement of recall involved quizzes takenby all participants following each lecture. Each quiz hadten short-answer questions. These questions were split intothree categories (not visible to the participant). One cate-gory included three questions that asked about informationnot directly associated with information on the map. Thus,referential gaze should not have had an effect on the recallof this information. An example question in this categorywould be, “In what year did the Jin dynasty overtake controlof China?” The Jin dynasty was not represented on the mapduring the associated lecture. The second category, consistingof four questions, asked about purely spatial information. Forexample, one question here was “Which of the Three King-doms dynasties extended farthest south?” This question is onlyanswerable by having studied the map. The third category,including the remaining three questions of the quiz, reliedon building associations between verbal lecture content andspatial map locations. For example, “Give one reason whyEmperor Wen declared the city of Luoyang to be his capitalcity.” Referential gaze was expected to make the biggest im-pact on questions from the latter two categories. Questionsfrom all three categories were randomly permutated to createthe final ten-question quiz.

The subjective measurements were split into six broad indica-tors. Each question within the indicators took the form of aseven-point rating scale. Item reliability (Cronbach’s α) wasacceptable or better for all except skilled communicator.1. Likeability: Four-item measure of how likeable the participant

found the agent to be. Includes questions on perceived friendlinessand helpfulness. (Cronbach’s α = .78)

2. Rapport: Six-item measure of how much the participant felt feel-ings of rapport in relation to the agent. Questions asked, e.g., howwell the participant felt he or she connected with the agent andhow willingly he or she would disclose personal information tothe virtual agent following the lecture. (Cronbach’s α = .84)

Figure 9. Objective measure (recall measured by post-lecture quiz). Onthe left is the total quiz performance, on the right is the quiz performancewhen only considering a subset of the questions: those dealing with spatialinformation and building associations.

3. Trust: Two-item measure of how trustworthy the participant per-ceived the agent to be. Includes ratings of trustworthiness andhonesty. (Cronbach’s α = .72)

4. Intelligence: Three-item measure of how intelligent the participantperceived the agent to be. Includes ratings of competence andexpertise. (Cronbach’s α = .84)

5. Skilled Communicator: Three-item measure of how effective atconveying lecture material the participant perceived the agent tobe. Item reliability was questionable (Cronbach’s α = .62)

6. Engagement: Six-item measure of how engaged the participant feltduring the lecture. Includes personal ratings of focus, attentiveness,and satisfaction. (Cronbach’s α = .89)

Manipulation ChecksTo check that our gaze type manipulations were being noticedbetween the visible agent conditions, we asked the participantsto rate from 0% to 100% how much they felt the agent waspaying attention to them and to the map. We expected thatparticipants would feel more attended to in the affiliative andboth conditions than in the referential condition. Conversely,we expected participants to feel like the agent was attendingto the map more in the referential and both gaze conditionsthan in the affiliative condition.

ResultsAnalysis of our data was conducted using a repeated measuresanalysis of variance (ANOVA). We used Tukey-Kramer HSD

to control the experiment-wise error rate in all post-hoc tests.Our analysis started with the manipulation check. We foundthat participants felt more attended to in the affiliative gaze con-dition versus the referential condition, F (1, 69) = 12.53, p <.001. They also felt more attended to in the both conditionversus referential, F (1, 69) = 4.37, p = .040. Finally, par-ticipants felt that the agent attended to the map more in thereferential condition versus affiliative, F (1, 69) = 7.75, p =.007. This difference was not found to be significant for theboth condition versus affiliative, F (1, 69) = 0.37, p = .55,however participants rated the agent as attending more tothe map in the referential condition over the both condition,F (1, 69) = 4.75, p = .033.

Next we analyzed the objective results in the form of recallquiz scores (Figure 9). In terms of overall score, the au-dio condition resulted in significantly lower recall than theother three visible agent conditions, including affiliative gaze,F (1, 69) = 19.38, p < .001, referential gaze, F (1, 69) =37.78, p < .001, and both, F = 31.91, p < .001. Whenconsidering only the seven (out of ten total) questions thatdealt with purely spatial map information and building asso-ciations between verbal lecture content and locations on themap, we found that the referential gaze condition resulted insignificantly better recall performance than the affiliative gazecondition, F (1, 69) = 5.62, p = .021. The increase in recallperformance from the both condition over affiliative does notquite reach significance, F (1, 69) = 2.58, p = .11. Refer-ential and both were not significantly different, F (1, 69) =0.59, p = .45.

Finally, we analyzed the subjective measures. Here we ob-served that on the likeability scale, the referential conditionrated lower than both the affiliative condition, F (1, 69) =58.86, p < .001, and both condition, F (1, 69) = 52.65, p <.001. On the rapport scale, the referential condition also ratedlower than both affiliative, F (1, 69) = 13.25, p < .001, andboth, F (1, 69) = 7.95, p = .006. The trust scale had similarresults, with the referential condition rated lower than bothaffiliative, F (1, 69) = 8.63, p = .005, and both, F (1, 69) =5.55, p = .021. The intelligence scale again shows the referen-tial condition getting rated lower than affiliative, F (1, 69) =11.38, p = .001, and both, F (1, 69) = 10.99, p = .002. Theskilled communicator scale yielded no significant results be-

Figure 8. Results for subjective evaluations (likeability, rapport, trust, intelligence, skilled communicator, and engagement) based on gaze condition.

tween conditions, but the engagement scale showed the ref-erential condition getting rated significantly lower than theaffiliative condition, F (1, 69) = 14.53, p < .001, and bothcondition, F (1, 69) = 17.16, p < .001. These results aresummarized in Figure 8.

DISCUSSIONThe purpose of the experiment was to demonstrate how subtlechanges in gaze behavior can lead to significant high-level ef-fects, with the goal of improving learning and positive feelingsof affiliation. To do this, we manipulated the head align-ment parameter in our model implemented on various virtuallecturing agents. We showed that by manipulating just thisparameter for a virtual agent shifting its gaze from the partici-pant to an object in the scene (i.e., the map of China) and backagain, the agent could achieve very different subjective andobjective effects, including the participant’s feelings towardthe agent and information recall respectively.

Confirming our first hypothesis, the mere presence of an agentresulted in better recall performance than audio alone, nomatter which gaze condition was being used. For five out ofsix subjective scales, the affiliative and both gaze conditionsachieved better ratings than the referential condition. Thisleads us to accept our second hypothesis, showing that humanlisteners prefer when the virtual agent speaker aligns its headfully with the participant while speaking, rather than lookingout of the corners of its eyes with its head aligned towardssomething else. The skilled communicator scale did not ex-hibit this effect, which could mean that participants thoughtthe agent was doing a similarly decent job of communicatingthe lecture content no matter which gaze condition was used.We also have strong support for our third hypothesis, wherewe showed that referential gaze results in better participantrecall than affiliative gaze. By keeping its head aligned withthe map as much as possible, the agent compelled the partic-ipant to concentrate more on the map and learn the spatiallocations better, while building associations between verballecture content and those same locations. As a reminder, thesame amount of gaze was used by the agent in both the affilia-tive and referential gaze conditions. Hence in both conditionsthe participant was able to benefit from gaze as an arousalstimulus to learn the lecture material better. The differencelies in the way each gaze shift was performed, proving thatnot all ways of gazing are equal and subtle changes can cre-ate significant outcomes. Overall, this study showed that ourmodel was capable of achieving targeted outcomes.

Design ImplicationsEmbodied agents’ potential in computer interfaces is increas-ing with the ever-growing popularity of interactive games andcomputer-based tools for learning, motivation, and rehabilita-tion. However, agent designers need controllable models ofinteractive behavior and evidence that these models effectivelyimprove outcomes such as learning and rapport. Our workprovides one such example in the context of gaze.

We have shown that it is possible to implement in virtualagents the very subtle gaze cues that humans use to greateffect in social situations, and that manipulating these cuescan achieve significant objective and subjective high-level

effects in a human interacting with the agent. Designers ofvirtual agents can use these subtle gaze behaviors, such as headalignment, to reach different desired outcomes. If the agentdesigner wants human interlocutors to pay more attention tospecific objects in the environment, possibly to learn moreabout them, the agent could be programmed to use high headalignment when gazing to those objects. Similarly, if the agentdesigner wants the agent to build a stronger relationship withthe human interlocutor, increasing feelings of, e.g., rapportand trust, the agent should be programmed to use high headalignment when gazing towards the human. Our model ofgaze behavior offers a simple and effective means to controlthe low-level gaze parameters found in physiological research.Virtual agent designers can use and build off of this modelto create rich, compelling gaze behaviors that accomplish thehigh-level effects they wish to achieve.

Limitations and Future WorkOur gaze mechanisms are currently focused on the contribu-tions of the head and eyes alone. Future extensions shouldconsider the movement of other body parts, such as employingthe neck in performing head movements. While the work pre-sented here explored our gaze shift model on very humanlikecharacters, we intend to explore its application to differentcharacter designs, e.g., stylized cartoons, and embodiments,e.g., storytelling robots [39]. We also intend to run futurestudies exercising different parameters of the model (e.g., headlatency) and measuring the effects of subtle gaze cues in dif-ferent interaction modalities, such as VR and mobile devices.

CONCLUSIONSGaze is a complex and powerful cue. Through subtle changesin gaze, people can achieve a wide range of social and com-municative goals, affecting their partner of interaction in avariety of different ways. Gaze cues, as with all embodiedcommunication cues, hold a strong fundamental connectionwith key social, cognitive, and task outcomes. This connectionreveals an opportunity for designing embodied dialog withvirtual agents. Designing gaze behaviors for virtual agentsthat can achieve specifically targeted high-level outcomes haslong been a difficult problem. By creating a mechanism forsynthesizing gaze shifts in a natural, yet parameterized fash-ion, we have provided a building block for creating high-levelsocial and communicative behaviors. In this paper we pre-sented a model of gaze shifts validated to achieve humanlikegaze for virtual agents, and a study which shows that it canachieve interesting subjective and objective effects throughmanipulation of low-level gaze parameters. These subtle gazecues are effective in the context of our gaze model due tothe grounded physiological approach we took in designing it.By manipulating and combining them in different ways, webelieve that virtual agents will soon have access to a rich newsource of possible gaze behaviors, resulting in human-agentinteractions that are more effective and rewarding.

ACKNOWLEDGMENTSThis research was supported by National Science Founda-tion award 1017952. We would like to thank Allison Terrell,Danielle Albers, and Erin Spannan for their help in creatingthe videos used in the experiments described in this paper.

REFERENCES

1. Argyle, M., and Cook, M. Gaze and mutual gaze. Cambridge UniversityPress Cambridge, 1976.

2. Bailenson, J., Yee, N., Merget, D., and Schroeder, R. The effect ofbehavioral realism and form realism of real-time avatar faces on verbaldisclosure, nonverbal disclosure, emotion recognition, and copresence indyadic interaction. Presence: Teleoperators and Virtual Environments15, 4 (2006), 359–372.

3. Bayliss, A., Paul, M., Cannon, P., and Tipper, S. Gaze cuing andaffective judgments of objects: I like what you look at. Psychonomicbulletin & review 13, 6 (2006), 1061–1066.

4. Beebe, S. Effects of eye contact, posture and vocal inflection uponcredibility and comprehension.

5. Burgoon, J., Coker, D., and Coker, R. Communicative effects of gazebehavior. Human Communication Research 12, 4 (1986), 495–524.

6. Cafaro, A., Gaito, R., and Vilhjalmsson, H. Animating idle gaze inpublic places. In Intelligent Virtual Agents, Springer (2009), 250–256.

7. Cassell, J., Bickmore, T., Billinghurst, M., Campbell, L., Chang, K.,Vilhjalmsson, H., and Yan, H. Embodiment in conversational interfaces:Rea. In Proceedings of the SIGCHI conference on Human factors incomputing systems: the CHI is the limit, ACM (1999), 520–527.

8. Cassell, J., Torres, O., and Prevost, S. Turn taking vs. discourse structure:How best to model multimodal conversation. Machine conversations(1999), 143–154.

9. Deng, Z., Lewis, J., and Neumann, U. Automated eye motion usingtexture synthesis. IEEE Computer Graphics and Applications (2005),24–30.

10. Fox, J., and Bailenson, J. Virtual virgins and vamps: The effects ofexposure to female characters sexualized appearance and gaze in animmersive virtual environment. Sex roles 61, 3 (2009), 147–157.

11. Freedman, E., and Sparks, D. Activity of cells in the deeper layers of thesuperior colliculus of the rhesus monkey: evidence for a gazedisplacement command. Journal of neurophysiology 78, 3 (1997), 1669.

12. Frischen, A., Bayliss, A., and Tipper, S. Gaze cueing of attention: Visualattention, social cognition, and individual differences. Psychologicalbulletin 133, 4 (2007), 694.

13. Fuller, J. Head movement propensity. Experimental Brain Research 92, 1(1992), 152–164.

14. Fullwood, C., and Doherty-Sneddon, G. Effect of gazing at the cameraduring a video link on recall. Applied Ergonomics 37, 2 (2006), 167–175.

15. Garau, M., Slater, M., Bee, S., and Sasse, M. The impact of eye gaze oncommunication using humanoid avatars. In Proceedings of the SIGCHIconference on Human factors in computing systems, ACM (2001),309–316.

16. Goldberg, G., Kiesler, C., and Collins, B. Visual behavior andface-to-face distance during interaction. Sociometry (1969), 43–53.

17. Goldring, J., Dorris, M., Corneil, B., Ballantyne, P., and Munoz, D.Combined eye-head gaze shifts to visual and auditory targets in humans.Experimental brain research 111, 1 (1996), 68–78.

18. Goossens, H., and Opstal, A. Human eye-head coordination in twodimensions under different sensorimotor conditions. Experimental BrainResearch 114, 3 (1997), 542–560.

19. Griffin, Z. Gaze durations during speech reflect word selection andphonological encoding. Cognition 82, 1 (2001), B1–B14.

20. Guitton, D., and Volle, M. Gaze control in humans: eye-headcoordination during orienting movements to targets within and beyondthe oculomotor range. Journal of neurophysiology 58, 3 (1987), 427.

21. Harris, M., and Rosenthal, R. No more teachers dirty looks: Effects ofteacher nonverbal behavior on student outcomes. Applications ofnonverbal communication (2005), 157–192.

22. Heylen, D., Van Es, I., Van Dijk, E., NI-JHOLT, A., van Kuppevelt, J.,Dybkjaer, L., and Bernsen, N. Experimenting with the gaze of aconversational agent, 2005.

23. Hietanen, J. Does your gaze direction and head orientation shift myvisual attention? Neuroreport 10, 16 (1999), 3443.

24. Ipeirotis, P. Demographics of Mechanical Turk. Tech. Rep.CeDER-10-01, 2010. Accessed on 10-Mar-2010 athttp://hdl.handle.net/2451/29585.

25. Itti, L., Dhavale, N., and Pighin, F. Photorealistic attention-based gazeanimation. In 2006 IEEE International Conference on Multimedia andExpo, IEEE (2006), 521–524.

26. Kelley, D., and Gorham, J. Effects of immediacy on recall of information.Communication Education (1988).

27. Kim, K., Brent Gillespie, R., and Martin, B. Head movement control invisually guided tasks: Postural goal and optimality. Computers inBiology and Medicine 37, 7 (2007), 1009–1019.

28. Kittur, A., Chi, E., and Suh, B. Crowdsourcing user studies withMechanical Turk. In Proceeding of the twenty-sixth annual SIGCHIconference on Human factors in computing systems, ACM (2008),453–456.

29. Lance, B., and Marsella, S. The expressive gaze model: Using gaze toexpress emotion. Computer Graphics and Applications, IEEE 30, 4(2010), 62–73.

30. Langton, S., and Bruce, V. Reflexive visual orienting in response to thesocial attention of others. Visual Cognition (1999).

31. Lee, J., Marsella, S., Traum, D., Gratch, J., and Lance, B. The rickelgaze model: A window on the mind of a virtual human. In IntelligentVirtual Agents, Springer (2007), 296–303.

32. Lee, S., Badler, J., and Badler, N. Eyes alive. In ACM Transactions onGraphics (TOG), vol. 21, ACM (2002), 637–644.

33. Lester, J., Towns, S., Callaway, C., Voerman, J., and FitzGerald, P.Deictic and emotive communication in animated pedagogical agents.Embodied conversational agents (2000), 123–154.

34. Liu, C., Kay, D., and Chai, J. Awareness of partners eye gaze in situatedreferential grounding: An empirical study. 2nd Workshop on Eye Gaze inIntelligent Human Machine Interaction (2011).

35. Ma, X., Le, B., and Deng, Z. Perceptual analysis of talking avatar headmovements: A quantitative perspective. CHI’11 (2011).

36. Mason, M., Tatkow, E., and Macrae, C. The look of love: Gaze shifts andperson perception. Psychological Science (2005), 236–239.

37. Mehrabian, A. Immediacy: An indicator of attitudes in linguisticcommunication. Journal of Personality 34, 1 (1966), 26–34.

38. Meyer, A., Sleiderink, A., and Levelt, W. Viewing and naming objects:Eye movements during noun phrase production. Cognition 66, 2 (1998),B25–B33.

39. Mutlu, B., Forlizzi, J., and Hodgins, J. A storytelling robot: Modelingand evaluation of human-like gaze behavior. In Humanoid Robots, 20066th IEEE-RAS International Conference on, IEEE (2006), 518–523.

40. Nijholt, A., Heylen, D., and Vertegaal, R. Inhabited interfaces: Attentiveconversational agents that help. In Proceedings 3rd internationalConference on Disability, Virtual Reality and AssociatedTechnologies-CDVRAT2000, Alghero, Sardinia (2000).

41. Otteson, J., and Otteson, C. Effect of teacher’s gaze on children’s storyrecall. Perceptual and Motor Skills (1980).

42. Parke, F., and Waters, K. Computer facial animation. AK Peters Ltd,2008.

43. Pelachaud, C., and Bilvi, M. Modelling gaze behavior for conversationalagents. In Intelligent Virtual Agents, Springer (2003), 93–100.

44. Pelz, J., Hayhoe, M., and Loeber, R. The coordination of eye, head, andhand movements in a natural task. Experimental Brain Research 139, 3(2001), 266–277.

45. Peters, C. Animating gaze shifts for virtual characters based on headmovement propensity. In 2010 Second International Conference onGames and Virtual Worlds for Serious Applications, IEEE (2010), 11–18.

46. Rickel, J., and Johnson, W. Task-oriented collaboration with embodiedagents in virtual worlds. Embodied conversational agents (2000),95–122.

47. Sherwood, J. Facilitative effects of gaze upon learning. Perceptual andMotor Skills (1987).

48. Steptoe, W., and Steed, A. High-fidelity avatar eye-representation. InVirtual Reality Conference, 2008. VR’08. IEEE, IEEE (2008), 111–114.

49. Tartaro, A., and Cassell, J. Authorable virtual peers for autism spectrumdisorders. In Proceedings of the Combined workshop onLanguage-Enabled Educational Technology and Development andEvaluation for Robust Spoken Dialogue Systems at the 17th EuropeanConference on Artificial Intellegence, Citeseer (2006).

50. Vertegaal, R. The gaze groupware system: mediating joint attention inmultiparty communication and collaboration. In Proceedings of theSIGCHI conference on Human factors in computing systems: the CHI isthe limit, ACM (1999), 294–301.

51. Zangemeister, W., and Stark, L. Types of gaze movement: variableinteractions of eye and head movements. Experimental Neurology 77, 3(1982), 563–577.

Designing Effective Gaze Mechanisms for Virtual...

Documents

Transcript of Designing Effective Gaze Mechanisms for Virtual...