Predicting User Intent Through Eye Gaze for Shared Autonom

6
Predicting User Intent Through Eye Gaze for Shared Autonomy Henny Admoni, Siddhartha Srinivasa The Robotics Institute, Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh, PA 15217 {hadmoni, ss5} @ andrew.cmu.edu Abstract Shared autonomy combines user control of a robot with in- telligent autonomous robot behavior to help people perform tasks more quickly and with less effort. Current shared au- tonomy frameworks primarily take direct user input, for ex- ample through a joystick, that directly controls the robot’s actions. However, indirect input, such as eye gaze, can be a useful source of information for revealing user intentions and future actions. For example, when people perform manipula- tion tasks, their gaze centers on the objects of interest before the corresponding movements even begin. This implicit infor- mation contained in eye gaze can be used to improve the goal prediction of a shared autonomy system, improving its overall assistive capability. In this paper, we describe how eye gaze behavior can be incorporated into shared autonomy. Building on previous work that represents user goals as latent states in a POMDP, we describe how gaze behavior can be used as ob- servations to update the POMDP’s probability distributions over goal states, solving for the optimal action using hind- sight optimization. We detail a pilot implementation that uses a head-mounted eye tracker to collect eye gaze data. Introduction Robot teleoperation allows people to operate in inacces- sible environments or to extend their physical capabilities beyond their own limitations. However, as robots become more capable of performing real-world tasks, they also be- come more complex, increasing the degrees of freedom (DOFs) that must be controlled for successful teleopera- tion. Fully teleoperating a robot can be undesirable, for ex- ample, in assistive care, where a user’s physical ability to manipulate an input device is limited (Maheu et al. 2011; Tsui et al. 2008), or in bomb disposal scenarios, where time and user fatigue can lead to catastrophe (Kron et al. 2004). Shared autonomy is one solution to this challenge of tele- operation: by combining human control of a robot with au- tonomous robot behavior, shared autonomy can help people accomplish tasks more quickly and with less effort. For example, consider a user with spinal cord injury who has a wheelchair-mounted robotic arm like the Ki- nova JACO (Figure 1). Typically, such a robot is controlled through direct teleoperation using an interface such as a 3 Copyright c 2016, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Figure 1: Shared autonomy consists of goal prediction (based on direct and indirect inputs) and robotic assistance. degree of freedom (DOF) joystick (Maheu et al. 2011) or a brain-computer interface (Muelling et al. 2015). However, the JACO has seven DOFs, and the translation from low- dimension input device to high-dimension robot control re- quires frequent control mode switches and careful physical manipulation. The cognitive and physical loads can be very high. Shared autonomy addresses this challenge of robot teleop- eration by augmenting human control with intelligent robot behavior that seamlessly assists a user in completing their desired action with less control input. However, in shared autonomy systems, robots may not know a priori which goal their human users want to pursue. Therefore, shared auton- omy requires a prediction component that continually mon- itors the user’s control input and outputs a prediction about which goal (or goals) the robot should assist toward. Previous work in shared autonomy has implemented this as a two step process, first predicting the user’s goal and then assisting when that goal prediction rises above a pre- specified confidence threshold (Fagg et al. 2004; Kofman The 2016 AAAI Fall Symposium Series: Shared Autonomy in Research and Practice Technical Report FS-16-05 298

Transcript of Predicting User Intent Through Eye Gaze for Shared Autonom

Page 1: Predicting User Intent Through Eye Gaze for Shared Autonom

Predicting User Intent ThroughEye Gaze for Shared Autonomy

Henny Admoni, Siddhartha SrinivasaThe Robotics Institute, Carnegie Mellon University

5000 Forbes Avenue, Pittsburgh, PA 15217{hadmoni, ss5} @ andrew.cmu.edu

Abstract

Shared autonomy combines user control of a robot with in-telligent autonomous robot behavior to help people performtasks more quickly and with less effort. Current shared au-tonomy frameworks primarily take direct user input, for ex-ample through a joystick, that directly controls the robot’sactions. However, indirect input, such as eye gaze, can be auseful source of information for revealing user intentions andfuture actions. For example, when people perform manipula-tion tasks, their gaze centers on the objects of interest beforethe corresponding movements even begin. This implicit infor-mation contained in eye gaze can be used to improve the goalprediction of a shared autonomy system, improving its overallassistive capability. In this paper, we describe how eye gazebehavior can be incorporated into shared autonomy. Buildingon previous work that represents user goals as latent states ina POMDP, we describe how gaze behavior can be used as ob-servations to update the POMDP’s probability distributionsover goal states, solving for the optimal action using hind-sight optimization. We detail a pilot implementation that usesa head-mounted eye tracker to collect eye gaze data.

Introduction

Robot teleoperation allows people to operate in inacces-sible environments or to extend their physical capabilitiesbeyond their own limitations. However, as robots becomemore capable of performing real-world tasks, they also be-come more complex, increasing the degrees of freedom(DOFs) that must be controlled for successful teleopera-tion. Fully teleoperating a robot can be undesirable, for ex-ample, in assistive care, where a user’s physical ability tomanipulate an input device is limited (Maheu et al. 2011;Tsui et al. 2008), or in bomb disposal scenarios, where timeand user fatigue can lead to catastrophe (Kron et al. 2004).Shared autonomy is one solution to this challenge of tele-operation: by combining human control of a robot with au-tonomous robot behavior, shared autonomy can help peopleaccomplish tasks more quickly and with less effort.

For example, consider a user with spinal cord injurywho has a wheelchair-mounted robotic arm like the Ki-nova JACO (Figure 1). Typically, such a robot is controlledthrough direct teleoperation using an interface such as a 3

Copyright c© 2016, Association for the Advancement of ArtificialIntelligence (www.aaai.org). All rights reserved.

Figure 1: Shared autonomy consists of goal prediction(based on direct and indirect inputs) and robotic assistance.

degree of freedom (DOF) joystick (Maheu et al. 2011) or abrain-computer interface (Muelling et al. 2015). However,the JACO has seven DOFs, and the translation from low-dimension input device to high-dimension robot control re-quires frequent control mode switches and careful physicalmanipulation. The cognitive and physical loads can be veryhigh.

Shared autonomy addresses this challenge of robot teleop-eration by augmenting human control with intelligent robotbehavior that seamlessly assists a user in completing theirdesired action with less control input. However, in sharedautonomy systems, robots may not know a priori which goaltheir human users want to pursue. Therefore, shared auton-omy requires a prediction component that continually mon-itors the user’s control input and outputs a prediction aboutwhich goal (or goals) the robot should assist toward.

Previous work in shared autonomy has implemented thisas a two step process, first predicting the user’s goal andthen assisting when that goal prediction rises above a pre-specified confidence threshold (Fagg et al. 2004; Kofman

The 2016 AAAI Fall Symposium Series: Shared Autonomy in Research and Practice

Technical Report FS-16-05

298

Page 2: Predicting User Intent Through Eye Gaze for Shared Autonom

et al. 2005). However, this type of predict-then-act processdoes not respond well to changing user goals, and the all-or-nothing assistance misses opportunities for assisting towardmultiple possible goals simultaneously. Recent work im-proves on predict-then-act methods by modeling the user’sgoal as the latent state of a POMDP, and using hindsight op-timization to solve the POMDP in real time (Javdani, Srini-vasa, and Bagnell 2015). This system can continually moni-tor user goals, dynamically adjust its assistance to new pre-dictions, and assist toward multiple goals simultaneously ifpossible.

Existing shared autonomy systems still rely primarily onuser teleoperation input, through joystick or other direct in-terfaces that send control signals to the robot. Even when thesignals are indirect, for example a point-and-click interfacethat provides high-level action targets (Ciocarlie et al. 2012;Tsui et al. 2008), the user still explicitly communicates theirintent through the input device.

In contrast, we investigate how shared autonomy systemscan take advantage of the indirect signals people implicitlyprovide when performing their tasks. In particular, sharedautonomy systems could use people’s eye gaze as an addi-tional, implicit signal to predict their goal (Figure 1). Usingthis indirect eye gaze signal is different than using a directinput signal controlled through an eye gaze interface. Ourapproach takes advantage of natural gaze behavior ratherthan repurposing the eyes as an input device.

Natural eye gaze is highly predictive of object goals inmanipulation tasks, because gaze during task completion israrely directed outside of the objects required for the task(Hayhoe and Ballard 2005), and it is closely linked to handmotion (Pelz, Hayhoe, and Loeber 2001). Previous workhas identified intentions and actions based on gaze behav-ior (Huang et al. 2015; Yi and Ballard 2009) and used thesepredictions to provide robotic assistance in a human-robotcollaboration (Huang and Mutlu 2016).

In this paper, we extend a previous shared autonomyframework (Javdani, Srinivasa, and Bagnell 2015) with indi-rect inputs in the form of eye gaze. Our main contribution isthis formalization of gaze-based goal prediction which canbe integrated into the shared autonomy model. We describethis formalization and outline an initial data collection studyto learn how eye gaze predicts goals in a shared autonomyinteraction.

Related WorkThis paper builds on existing shared autonomy models bydescribing how eye gaze input can help augment user goalpredictions. Therefore, this work is relevant to both theshared autonomy domain, which tends to includes machinelearning and robotics, as well as the eye gaze domain, whichincludes cognitive psychology, computer vision, and human-robot interaction.

Shared Autonomy

Shared autonomy (alternatively: shared control, assistiveteleoperation) combines human teleoperation of a robot withintelligent robotic assistance. In assistive care, shared auton-omy systems have been developed for wheelchair-mounted

robot arms that use a variety of interfaces, such as an “eye-mouse” on a computer screen (Bien et al. 2004), natural lan-guage commands (Volosyak, Ivlev, and Graser 2005), andlaser pointers (Lim, Lee, and Kwon 2003). These interfacesusually provide explicit action commands directly to therobot (such as opening the gripper). In contrast, the currentsystem uses the implicit signal from natural eye gaze behav-ior to predict what a user wants the robot to do.

The first step of shared autonomy is to predict the hu-man’s goal, often through machine learning methods thatare trained from human demonstrations. Goal prediction hasbeen achieved using object affordances with conditional ran-dom fields to model the spatio-temporal structure of tasks(Koppula and Saxena 2016); through a probabilistic modelof observed human movements that represents intention as alatent variable (Wang et al. 2013); and using Gaussian Mix-ture Autoregression, which uses statistical features of 2Dcontrol inputs (mouse movements) to estimate task types andparameters (Hauser 2013).

In prior work that we build from in this paper (Javdani,Srinivasa, and Bagnell 2015), goal prediction is achieved viamaximum entropy inverse optimal control (MaxEnt IOC),which derives a user’s utility function from demonstrationsusing the principle of maximum entropy (Ziebart et al.2008). MaxEnt IOC has been used to infer the trajectories ofpedestrians (Ziebart et al. 2009) and to predict pointing tra-jectories in user interfaces (Ziebart, Dey, and Bagnell 2012),as well as for shared autonomy in a manipulation task (Jav-dani, Srinivasa, and Bagnell 2015).

The second step of shared autonomy is to provide as-sistance. This often involves blending the user’s input withthe robot assistance calculated to achieve the predicted goal.Blending methods range from allowing the robot to take fullcontrol once a threshold prediction confidence is reached(Fagg et al. 2004; Kofman et al. 2005) to combining hu-man and robot trajectories using a motion planner (Dra-gan and Srinivasa 2013). Assistance can also provide task-dependent guidance (Aarno, Ekvall, and Kragic 2005) ormode switches (Herlant, Holladay, and Srinivasa 2016) thatadjust based on the detected goal.

In the prior work that forms the basis for this paper (Jav-dani, Srinivasa, and Bagnell 2015), assistance is provided asa blending between user input and robot control, based onthe confidence of the robot’s goal prediction. The notableadvancement of this prior work is that assistance is automat-ically provided to all goals simultaneously if possible, re-moving the need to separate goal prediction and assistancephases. In this paper, we build off this contribution by in-cluding eye gaze as an input to the goal prediction.

Eye Gaze

People’s eye gaze reveals their intentions and future actionsby indicating their direction of attention, particularly in ma-nipulation tasks. In a natural pick-and-place task, hand andeye movements are strongly correlated (Pelz, Hayhoe, andLoeber 2001). When completing such tasks, gaze is focusedon objects related to the task at hand, and gaze typicallyshifts to an object about 600ms before that object is grasped(Hayhoe and Ballard 2005; Land and Hayhoe 2001).

299

Page 3: Predicting User Intent Through Eye Gaze for Shared Autonom

Prior work has looked at predicting where people’s atten-tion will be directed. Some work takes a neurobiological ap-proach, modeling the underlying neuronal processes to iden-tify salient regions of a visual scene (Borji and Itti 2013;Itti and Koch 2001; Vig, Dorr, and Cox 2014). Other workuses features of body motion—for example that there is acenter bias in egocentric gaze (when the camera is mountedon a viewer’s head), or that gaze is predictably distributedaround the hands during manipulation—to predict gaze pat-terns in video (Li, Fathi, and Rehg 2013).

Eye gaze has also been used to automatically recognizeuser intent and actions. For example, an HMM that wastrained on eye-head-hand coordination, which included gazetracking, can recognize actions on objects, like “unscrew-ing a jar” (Yu and Ballard 2002). A dynamic Bayes net-work trained with head, hand, and eye gaze has been usedto predict which steps a person is engaged in during a com-plex (sandwich-making) task (Yi and Ballard 2009). In an-other sandwich-making task, an SVM-based classifier pre-dicted the intended target of a speaker’s request for ingre-dients using just eye gaze data with 76% accuracy (Huanget al. 2015). This classifier was then used in a system thatprovided anticipatory robotic assistance in real time fromgaze-based intent predictions (Huang and Mutlu 2016).

In contrast with this previous work in human-robot col-laboration, the system presented in this paper uses the goalprediction to provide assistance via shared control ratherthan using the goal prediction for autonomous robot behav-ior. Additionally, the current framework is robust to sce-nario changes as it does not need to be trained on object ap-pearances or locations. Instead, it uses features of the scene(namely object distances from gaze fixations) to predict theuser’s goal.

ApproachIn this section, we summarize the formalization for goal pre-diction and assistance from previous work (Javdani, Srini-vasa, and Bagnell 2015) and define the addition of eye gazeas an input. In summary, there exist a set of discrete goals(e.g., grasp poses for a set of objects on a table). A hu-man user supplies both direct and indirect inputs to therobot (e.g., joystick control and eye gaze, respectively) toachieve one of these goals (e.g., grasping the target object).The shared autonomy system begins with no knowledge ofthe user’s particular goal, but predicts that goal through theuser’s inputs. Simultaneously, it selects robot actions that,given the goal prediction, minimize the expected cost ofcompleting that goal.

Predicting Human Goals

The robot has continuous states x ∈ X (e.g., position, ve-locity) and continuous actions a ∈ A (e.g., velocity, torque).The robot is modeled as a deterministic dynamical systemwith transition function T : X ×A → X .

A user can supply continuous direct inputs j ∈ J througha control interface (e.g., joystick commands), which mapto robot actions through a known deterministic functionD : J → A. The function D corresponds to a user fullyteleoperating the robot.

In this paper, we define a second set of (possibly discrete)indirect user inputs e ∈ E produced through behavior likeeye gaze. This indirect input is not used to drive the robot, sothere is no analogous transition function to D. To simplifythe goal prediction, we assume j and e are independent. Thejoint pair of user actions is denoted as (j, e) = u ∈ U .

There exist a known, discrete set of goals g ∈ G, oneof which is the user’s target. To predict the user’s next ac-tion given this target, the system has a stochastic user policyfor each goal πu

g(x) = p(u|x, g), which produces a distri-bution over actions for each robot state x and goal g. Thisuser policy is typically learned from demonstration. Follow-ing previous work (Javdani, Srinivasa, and Bagnell 2015),we model this user policy using the MaxEnt IOC framework(Ziebart et al. 2008), which assumes the user is optimizinga cost function for their target goal g, Cu

g : X × U → R.Because the user’s goals are known to themselves, the usermodel corresponds to a Markov Decision Process (MDP) forthe specified goal, defined by tuple

(X,U, T, Cu

g

).

Though the user knows their own goal, the system doesnot know the user’s goal. Therefore, we model the system’sknowledge as a Partially Observable Markov Decision Pro-cess (POMDP), where the user’s goal is the unobservablestate. A POMDP is similar to an MDP in that actions takenin one state lead to a new state as defined by the transitionfunction, but it differs from an MDP in that we cannot fullyobserve the current state. Instead, the POMDP maintains adistribution over states, known as the belief b, and it mapsbeliefs (rather than states) to actions.

The system state s ∈ S is now defined by both a robotstate and a user goal, s = (x, g), and S = X × G. Weassume the robot state is known, and uncertainty is entirelyover the user’s goal. The POMDP transition function T :S×A → S corresponds to transitioning between robot statesbut maintaining the same user goal.

Observations in our POMDP correspond to user inputs u.Our POMDP has an observation model Ω that infers a prob-ability distribution over states s given a sequence of observa-tions, which are time-linked state-input pairs (st, ut). Sinceuncertainty is only over user goals, inferring the probabilitydistribution over states is equivalent to inferring the proba-bility over user goals. Let ξ0→t = {s0, u0, . . . , st, ut} be asequence of observations from time 0 to time t. To computethe probability of a goal given the observed sequence up totime t, we apply Bayes’ rule:

p(g|ξ0→t) =p(ξ0→t|g)p(g)∑g′ p(ξ0→t|g′)p(g′)

To select an action, the system uses a cost function C r :S × A × U → R, which defines the cost of taking robotaction a from state s when the observed user input is u. ThePOMDP is thus defined by tuple (S,A, T, C r, U,Ω).

Modeling Eye Gaze

To integrate gaze into this shared autonomy model, we ex-tend the observation model from our POMDP to includegaze input. Our challenge is to formalize how eye gaze pre-dicts goals. Given some sequence of gaze inputs over time,

300

Page 4: Predicting User Intent Through Eye Gaze for Shared Autonom

η0→t = {e0, . . . , et}, we want a probability distributionover goals. In other words, we aim to find p(g|η0→t) foreach goal g ∈ G. Because we assume that direct inputs jand indirect inputs e are independent, η0→t is independentfrom ξ0→t.

In our formulation, gaze is collected from a head-mountedeye tracker. Thus, gaze inputs e = (I, z) consist of a 2Dimage I, containing a first-person view of the world, and a2D point of gaze fixation z on the image. Object locations inthe image are known.

We first calculate a probability distribution over a singleimage at time t, p(g|It), for all goals. Developing a modelfor goal prediction from gaze is non-trivial. Unlike directjoystick input, which (in an ideal case) continually movesthe robot end effector closer to the target, eye gaze behavesmore stochastically. People look at their goals, but they alsouse gaze to monitor obstacles, confirm their body position,and engage socially.

Despite these complexities, eye gaze does reveal goalsduring task-based action. People’s gaze is temporally linkedto the task at hand, and task-related objects are fixated withthe eyes even before the hand starts moving toward them(Land and Hayhoe 2001). When operating on a sequence ofobjects, people shift their gaze toward the next object in thesequence before they finish using the previous one, reveal-ing future action. For example, gaze will shift away froman object that is about to be grasped just as the hand closesaround it (Johansson et al. 2001). Objects that are not in-volved in the task are rarely gazed at (Hayhoe and Ballard2005). People are skilled at using gaze to predict future ac-tion of others: when a person looks at an object as they begina verbal reference, viewers respond quickly to identify thatobject; when gaze is occluded, that response time increasessignificantly (Boucher et al. 2012).

Building on this idea that task-based gaze is primarily fo-cused on an object related to the task, we devise a simplemodel for how eye gaze predicts goals. More complex mod-els could be learned, which is a planned extension of thiswork. In our simple model, the likelihood of an object beingthe goal is proportional to its distance from the center of agaze fixation. Formally,

p(gi|It) ∝ exp (−dist(oi, z))

where oi is a particular object, p(gi) is the probability ofthat object being the goal, and dist(oi, z) is the Euclideandistance in image coordinates from the center of the objectoi in the image to the gaze point z. In other words, the like-lihood that an object is the goal falls off exponentially withits distance from the gaze fixation. Figure 2 visualizes thisprobability distribution as a heat map averaged over severalnearby fixations over several seconds.

Because eye gaze does not solely fixate on the target ob-ject, each individual image frame It may have widely vary-ing probability distributions. We can use a smoothing func-tion like a Kalman filter to add the probability distributionfrom each new image to the overall probability distributionp(g|I) to account for noise in the sensors and in the gazebehavior.

Once we have established the probability distribution

Figure 2: A heatmap showing the probability distributioncentered around the gaze target.

over goals based on eye gaze, we combine this with theprobability distribution over goals based on joystick inputto attain the joint probability distribution based on inputs,p(g|ξ0→t, η0→t). We currently use a simple product,

p(g|ξ0→t, η0→t) = p(g|ξ0→t) p(g|η0→t)

though more sophisticated methods with weights could belearned from demonstration. This new joint probability isthen used to solve for the optimal action of the POMDP us-ing hindsight optimization.

Selecting Assistive Actions via HindsightOptimization

Once we’ve established the distribution of goals, solving forthe optimal action is performed as in previous work (Jav-dani, Srinivasa, and Bagnell 2015), but using the joint prob-ability distribution. The optimal solution for the POMDP isa robot action that minimizes the expected accumulated costE[ΣtC

r(st, at, ut)]. Solving this POMDP is computation-ally intractable, however, so we use the QMDP approxima-tion (Littman, Cassandra, and Kaelbling 1995), also knownas hindsight optimization (Chong, Givan, and Chang 2000;Yoon et al. 2008), to select actions.

Intuitively, the process assumes that full observability willbe attained at the next timestep, and estimates the cost-to-go of the belief based on this assumption. This allows fora computationally efficient system even in continuous stateand action spaces.

The cost-to-go is defined by the action-value functionQ(b, a, u), which returns the cost of taking action a when inbelief state b with user input u, and acting optimally fromthat point forward. The QMDP approximation (Littman,Cassandra, and Kaelbling 1995) is

Q(b, a, u) =∑

g

b(g)Qg(x, a, u)

where Qg(x, a, u) represents the estimated cost-to-go if therobot in state x with user input u took action a. Becauseuncertainty is only over the user goals g ∈ G,

b(s) = b(g) = p(g|ξ0→t, η0→t)

301

Page 5: Predicting User Intent Through Eye Gaze for Shared Autonom

Figure 3: Sequence of still frames from the gaze tracker image stream. User gaze (green dot) monitors spout position and pitchercontents as pouring begins.

We approximate Qg by assuming that user input will ceasein the next timestep, which allows us to use the cost functionC r(s, a, u) = C r(s, a, 0). We can analytically compute thevalue of this cost function.

Implementation and Future Work

Egocentric gaze data is collected via a head-mounted eyetracker (Figure 1). We use the Pupil Labs Pupil headset(Kassner, Patera, and Bulling 2014), an open-source eyetracker that is worn like eyeglasses. The outward-facingworld camera provides a “user’s eye view” of the scene,while the inward-facing eye camera records the pupil po-sition and uses it to identify the gaze location on the worldcamera image (Figure 3).

To identify object and eye gaze locations, objects in thescene and the table are tagged with visual fiducial markers.Object positions and orientations can thus be calculated in3D relative to the table. Additionally, the world camera ex-trinsics can be identified automatically, and correspondingly,the user’s head is localized in the same 3D frame as the ob-jects. Therefore, though our goal detection gaze algorithmfunctions on 2D images, robot assistance can be provided in3D space once the goal object is identified.

This work is currently in the data collection phase. Weplan to collect examples of users completing everyday ma-nipulation tasks with their own hands (such as unscrewinga jar, dialing a phone, or pouring a glass of water from apitcher), which will provide gaze trajectories that can in-form a more sophisticated model for goal prediction viagaze. Once the gaze-based goal prediction is implemented,we can evaluate the system by comparing user performanceon robotic manipulation tasks in three conditions: full tele-operation, which acts as a baseline; shared autonomy with-out gaze prediction, as in prior work; and shared autonomythat integrates gaze into its goal prediction.

Acknowledgments

Thank you to Laura Herlant and EJ Eppinger for theirhelp collecting and visualizing gaze data. This work hasbeen partially funded by DARPA SIMPLEX (ARO con-tract #67904LSDRP), NSF CPS (#1544797), NSF NRI(#1227495), and ONR (#N00014-16-1-2084).

References

Aarno, D.; Ekvall, S.; and Kragic, D. 2005. Adaptive vir-tual fixtures for machine-assisted teleoperation tasks. InIEEE International Conference on Robotics and Automation(ICRA), 1139–1144.Bien, Z.; Chung, M.-J.; Chang, P.-H.; Kwon, D.-S.; Kim, D.-J.; Han, J.-S.; Kim, J.-H.; Kim, D.-H.; Park, H.-S.; Kang, S.-H.; et al. 2004. Integration of a rehabilitation robotic system(KARES II) with human-friendly man-machine interactionunits. Autonomous Robots 16(2):165–191.Borji, A., and Itti, L. 2013. State-of-the-art in visual atten-tion modeling. IEEE transactions on pattern analysis andmachine intelligence 35(1):185–207.Boucher, J.-D.; Pattacini, U.; Lelong, A.; Bailly, G.; Elisei,F.; Fagel, S.; Dominey, P. F.; and Ventre-Dominey, J. 2012. IReach Faster When I See You Look: Gaze Effects in Human-Human and Human-Robot Face-to-Face Cooperation. Fron-tiers in neurorobotics 6(May):1–11.Chong, E. K. P.; Givan, R. L.; and Chang, H. S. 2000. Aframework for simulation-based network control via hind-sight optimization. In IEEE Conference on Decision andControl.Ciocarlie, M.; Hsiao, K.; Leeper, A.; and Gossow, D. 2012.Mobile manipulation through an assistive home robot. InIEEE/RSJ International Conference on Intelligent Robotsand Systems (IROS), 5313–5320.Dragan, A., and Srinivasa, S. 2013. A policy blendingformalism for shared control. The International Journal ofRobotics Research.Fagg, A. H.; Rosenstein, M.; Platt, R.; and Grupen, R. A.2004. Extracting user intent in mixed initiative teleoperatorcontrol. In Proceedings of the American Institute of Aero-nautics and Astronautics Intelligent Systems Technical Con-ference.Hauser, K. 2013. Recognition, prediction, and planning forassisted teleoperation of freeform tasks. Autonomous Robots35(4):241–254.Hayhoe, M., and Ballard, D. 2005. Eye movements in natu-ral behavior. Trends in Cognitive Sciences 9(4):188–194.Herlant, L. V.; Holladay, R. M.; and Srinivasa, S. S. 2016.Assistive teleoperation of robot arms via automatic time-optimal mode switching. In ACM/IEEE International Con-ference on Human-Robot Interaction (HRI), 35–42.

302

Page 6: Predicting User Intent Through Eye Gaze for Shared Autonom

Huang, C.-M., and Mutlu, B. 2016. Anticipatoryrobot control for efficient human-robot collaboration. InACM/IEEE International Conference on Human-Robot In-teraction (HRI), 83–90.Huang, C.-M.; Andrist, S.; Sauppe, A.; and Mutlu, B. 2015.Using gaze patterns to predict task intent in collaboration.Frontiers in Psychology 6.Itti, L., and Koch, C. 2001. Computational modelling ofvisual attention. Nature Reviews Neuroscience 194–203.Javdani, S.; Srinivasa, S. S.; and Bagnell, J. A. 2015. Sharedautonomy via hindsight optimization. In Robotics: Scienceand Systems (RSS).Johansson, R. S.; Westling, G.; Backstrom, A.; and Flana-gan, J. R. 2001. Eye-hand coordination in object manipula-tion. The Journal of Neuroscience 21(17):6917–6932.Kassner, M.; Patera, W.; and Bulling, A. 2014. Pupil: anopen source platform for pervasive eye tracking and mobilegaze-based interaction. In Proceedings of the 2014 ACMinternational joint conference on pervasive and ubiquitouscomputing: Adjunct publication, 1151–1160.Kofman, J.; Wu, X.; Luu, T. J.; and Verma, S. 2005. Teleop-eration of a robot manipulator using a vision-based human-robot interface. IEEE Transactions on Industrial Electronics52(5):1206–1219.Koppula, H. S., and Saxena, A. 2016. Anticipating hu-man activities using object affordances for reactive roboticresponse. IEEE Transactions on Pattern Analysis and Ma-chine Intelligence 38(1):14–29.Kron, A.; Schmidt, G.; Petzold, B.; Zah, M. I.; Hinterseer, P.;and Steinbach, E. 2004. Disposal of explosive ordnances byuse of a bimanual haptic telepresence system. In IEEE Inter-national Conference on Robotics and Automation (ICRA),volume 2, 1968–1973.Land, M. F., and Hayhoe, M. 2001. In what ways do eyemovements contribute to everyday activities? Vision Re-search 41(25-26):3559–65.Li, Y.; Fathi, A.; and Rehg, J. M. 2013. Learning to predictgaze in egocentric video. In Proceedings of the IEEE In-ternational Conference on Computer Vision (ICCV), 3216–3223.Lim, S.; Lee, K.; and Kwon, D. 2003. Human friendly inter-faces of robotic manipulator control system for handicappedpersons. In Proceedings of the IEEE/ASME InternationalConference on Advanced Intelligent Mechatroncis (AIM).Littman, M. L.; Cassandra, A. R.; and Kaelbling, L. P.1995. Learning policies for partially observable environ-ments: Scaling up. In International Conference on MachineLearning (ICML).Maheu, V.; Frappier, J.; Archambault, P. S.; and Routhier, F.2011. Evaluation of the jaco robotic arm: Clinico-economicstudy for powered wheelchair users with upper-extremitydisabilities. In IEEE/RAS-EMBS International Conferenceon Rehabilitation Robotics (ICORR), 1–5.Muelling, K.; Venkatraman, A.; Valois, J.-S.; Downey,J.; Weiss, J.; Javdani, S.; Hebert, M.; Schwartz, A. B.;Collinger, J. L.; and Bagnell, J. A. D. 2015. Autonomy

infused teleoperation with application to bci manipulation.In Robotics: Science and Systems (RSS).Pelz, J.; Hayhoe, M.; and Loeber, R. 2001. The coordi-nation of eye, head, and hand movements in a natural task.Experimental Brain Research 139(3):266–277.Tsui, K.; Yanco, H.; Kontak, D.; and Beliveau, L. 2008.Development and evaluation of a flexible interface for awheelchair mounted robotic arm. In ACM/IEEE Interna-tional Conference on Human-Robot Interaction (HRI), 105–112.Vig, E.; Dorr, M.; and Cox, D. 2014. Large-scale optimiza-tion of hierarchical features for saliency prediction in naturalimages. In IEEE Conference on Computer Vision and Pat-tern Recognition (CVPR).Volosyak, I.; Ivlev, O.; and Graser, A. 2005. Rehabilitationrobot FRIEND II—the general concept and current imple-mentation. In IEEE/RAS-EMBS International Conferenceon Rehabilitation Robotics (ICORR), 540–544.Wang, Z.; Mulling, K.; Deisenroth, M. P.; Amor, H. B.; Vogt,D.; Scholkopf, B.; and Peters, J. 2013. Probabilistic move-ment modeling for intention inference in human-robot in-teraction. The International Journal of Robotics Research32(7):841–858.Yi, W., and Ballard, D. 2009. Recognizing behavior inhand-eye coordination patterns. The International Journalof Robotics Research 6(03):337–359.Yoon, S. W.; Fern, A.; Givan, R.; and Kambhampati, S.2008. Probabilistic planning via determinization in hind-sight. In AAAI Conference on Artificial Intelligence.Yu, C., and Ballard, D. H. 2002. Understanding humanbehaviors based on eye-head-hand coordination. In Biolog-ically Motivated Computer Vision, volume 2525 of LNCS.Springer. 611–619.Ziebart, B. D.; Maas, A. L.; Bagnell, J. A.; and Dey, A. K.2008. Maximum entropy inverse reinforcement learning. InAAAI Conference on Artificial Intelligence, 1433–1438.Ziebart, B. D.; Ratliff, N.; Gallagher, G.; Mertz, C.; Peter-son, K.; Bagnell, J. A.; Hebert, M.; Dey, A. K.; and Srini-vasa, S. 2009. Planning-based prediction for pedestrians.In IEEE/RSJ International Conference on Intelligent Robotsand Systems (IROS), 3931–3936. IEEE.Ziebart, B.; Dey, A.; and Bagnell, J. A. 2012. Probabilis-tic pointing target prediction via inverse optimal control.In International Conference on Intelligence User Interfaces(IUI), 1–10. ACM.

303