cdiLect2 - inf.ed.ac.uk · Models of emotion and drives for a sociable robot Kismet’s motivations...
Transcript of cdiLect2 - inf.ed.ac.uk · Models of emotion and drives for a sociable robot Kismet’s motivations...
9/19/13
1
Quoting from Breazeal 2003 1
Case Studies in Design Informatics 1 Jon Oberlander
Lecture 2: Designing a Sociable Robot – Kismet
Slides quote or paraphrase Breazeal 2003
http://www.inf.ed.ac.uk/teaching/courses/cdi1/
2
Course Timetable
Week Topic Mon Wed Thu Submit 16:00 Thu
1 HRI Intro (JO) Designing a robot (JO)
2 HRI State-of-the-art (JO) Tutorial Towards JAMES (JO)
3 HRI JAMES (RP) Tutorial JAMES (RP)
4 HDI Quantified selves (JO) Tutorial <COLLIDER> A1
5 HDI Quantified problems (JO) Tutorial LPE (JM)
6 HDI LPE (AV) Tutorial Strategic behaviour (SR) A2-draft
7 HDI Actigraphy (MW) Tutorial* Life logging (TBC)
8 HDI Light logging (TBC) Tutorial <COLLIDER> A2
9 Reflection (JO) Tutorial Reflection (JO)
10 Tutorial
11 (Tutorial) A3
3
Structure of lecture
Breazeal’s Kismet project: 1. Video introduction 2. Emotion model 3. Emotional expression 4. Lessons
4
9/19/13
2
Quoting from Breazeal 2003 5
Video introduction to Kismet
Hardware introduction (4:00) – http://www.ai.mit.edu/projects/sociable/movies/hardware-
narrative.mov
Visual attention (1:00) – http://www.ai.mit.edu/projects/sociable/movies/attention-
narrative.mov
Affect and feedback (2:30) – http://www.ai.mit.edu/projects/sociable/movies/affective-
intent-narrative.mov
Emotional drives (4:00) – http://www.ai.mit.edu/projects/sociable/movies/emotion-
narrative.mov
Turn taking (1:30) – http://www.ai.mit.edu/projects/sociable/movies/
conversation-narrative.mov
6
Cynthia Breazeal 2003
Emo4on and sociable humanoid robots
Int. J. Human-‐Computer Studies, 59, 119–155
Reeves and Nass 1996: – "a social interface may be a truly universal interface"
The ability for people to naturally communicate with such machines is important.
However, for suitably complex environments and tasks, the ability for people to intui4vely teach these robots will also be important.
Ideally, the robot could engage in various forms of social learning (imita4on, emula4on, tutelage, etc.), so that one could teach the robot just as one would teach another person.
Quoting from Breazeal 2003 7
Kismet’s design
Kismet engages people in natural and expressive face-‐to-‐face interac4on: – to eventually learn from them, reminiscent of parent–infant exchanges
perceives a variety of natural social cues from visual and auditory channels, and
delivers social signals to the human through gaze direc4on, facial expression, body posture, and vocal babble
Actuators, with 21 degrees of freedom (DoF): – 3 DoF direct the robot’s gaze – 3 control the orienta4on of its head, and – 15 move its facial features (e.g., eyelids, eyebrows, lips, and ears).
Sensors: – 4 color CCD cameras:
• there is one narrow field of view camera behind each pupil and the remaining two wide field of view cameras are mounted between the robot’s eyes
– 2 small microphones • one mounted on each ear
– 1 lavalier microphone worn by the person is used to process their vocaliza4ons.
Quoting from Breazeal 2003 8
9/19/13
3
Emo4ons
Basic, e.g. anger, disgust, fear, joy, sorrow, and surprise endowed by evolu4on because of their proven ability to facilitate adap4ve responses important reinforcers for learning new behavior a relevance-‐detec4on and response-‐prepara4on system
Scherer: – people affec4vely appraise events with respect to novelty, intrinsic pleasantness, goal/need
significance, coping, and norm/self compa4bility. These appraisals (along with other factors such as pain, hormone levels, drives, etc.)
evoke a par4cular emo4on that recruits response tendencies within mul4ple systems: – physiological changes (such as modula4ng arousal level via the autonomic nervous system), – adjustments in subjec4ve experience, – elicita4on of behavioral response (such as approach, a^ack, escape, etc.), and – displaying expression.
Emo4ons establish a desired rela4on between the organism and the environment that – pulls the creature toward certain s4muli and events and – pushes it away from others.
Expressive characteris4cs of emo4on in voice, face, gesture, and posture serve as an important func4on in communica4ng emo4onal state to others [and influencing them].
Quoting from Breazeal 2003 9
Kismet emo4on processing
Kismet is to socially engage people and ul4mately to learn from them.
Kismet’s emo4on and drive processes are designed such that the robot is in an alert and mildly posi4ve valenced state when it is interac4ng well with people and when the interac4ons are neither overwhelming nor under-‐s4mula4ng.
This corresponds to an environment that affords high learning poten4al as the interac4ons slightly challenge the robot, yet also allow Kismet to perform well.
Quoting from Breazeal 2003 10
Drives – and the func4on of emo4onal responses
Drive to engage people, – mo4vates the robot to be in the presence of people and to interact with
them
Drive to engage toys, – mo4vates the robot to interact with things, such as colorful toys
Drive to occasionally rest – allows the robot to shut out the external world, instead of trying to regulate
its interac4on with it
The robot’s emo4onal responses – mirror those of biological systems and therefore should seem plausible to a
human. – bias the robot’s behavior to bring it into contact with desired s4muli
(orienta4on or explora4on), or to avoid poor quality or dangerous s4muli (protec4on or rejec4on).
– [provide] a social signal to the human, who responds in a way to further promote the robot’s ‘‘well-‐being.’’
Quoting from Breazeal 2003 11
Emo4on processing
An ‘‘emo4onal’’ reac4on for Kismet consists of: – a precipita4ng event; – an affec4ve appraisal of that event; – a characteris4c expression (face, voice, posture); – ac4on tendencies that mo4vate a behavioral response.
High level percep4on invokes releaser processes: – simple ‘‘cogni4ve’’ assessments that combine
• lower-‐level perceptual features with • measures of … internal [robot] state into
• behaviorally significant perceptual categories.
Quoting from Breazeal 2003 12
9/19/13
4
Releasers' ac4va4on and evalua4on
Three factors – Drives -‐ which is current decides whether a s4mulus is (un)desired – Affec4ve state -‐ e.g. soothing speech releaser only ac4ve if state is
distressed – Ac4ve behaviours -‐ e.g no-‐face condi4on has different effect, depending on
how long/whether face search is under way
Ac4vated releasers are then evaluated in terms of – Arousal (how s4mula4ng is s4mulus) – Valence (how posi4ve is s4mulus) – Stance (how approachable is s4mulus)
Four factors contribute to this evalua4on: – Intensity of s4mulus (close, fast, big = high arousal) – Relevance (current goal = posi4ve valence, stance) – Intrinsic affect (praise = posi4ve valence, arousal; scolding = nega4ve
valence, arousal; threats = nega4ve valence, high arousal, withdraw stance) – Goal directness (progress towards success vs delay)
Quoting from Breazeal 2003 13
Emo4ons, expressions, and behaviour
AVS values then ac4vate mul4ple emo4ons (to varying degrees)
Expression of those emo4ons proceeds, if their ac4va4on exceeds a threshold level.
A winner-‐take-‐all arbitra4on picks the top emo4on, which [when it influences behaviour] will then help pull the robot system back towards equilibrium.
Expression precedes behaviour -‐ so allowing people to predict what Kismet will do, before it happens. – (e.g. shake toy too close -‐> fear expression, then escape movement)
AVS values map onto facial/vocal effector states
Quoting from Breazeal 2003 14
expressions, as long as those expressions share a family resemblance. Theresemblance exists because the expressions share common facial action units. Thefacial action units characterize how each facial muscle (or combination of facialmuscles) adjusts the skin and facial features to produce human expressions and facialmovements (Ekman and Friesen, 1982). It is also known that different expressionsfor different emotions share some of the same face action components (the raisedbrows of fear and surprise, for instance). It is hypothesized by Smith and Scott(1997) that those features held in common assign a shared affective meaning to eachfacial expression. The raised brows, for instance, convey attentional activity for bothfear and surprise.
5. Models of emotion and drives for a sociable robot
Kismet’s motivations (i.e., its ‘‘drives’’ and ‘‘emotions’’) establish its nature bydefining its ‘‘needs’’ and influencing how and when it acts to satisfy them. As aconvention, we use a different font to distinguish parts of the architecture of thisparticular system from the general uses of this word. For instance, emotion refers tothe particular set of computational processes that are active in the system. When the
arousal
sleep
displeasure pleasure
neutralu
excitement
depression
stress
calm
afraidid
frustrated
relaxed
content
elatedla
bored
sad
happap ypy
surprisep
sleepy
Fig. 2. Russell’s pleasure-arousal space for facial expression. Adapted from Russell (1997).
C. Breazeal / Int. J. Human-Computer Studies 59 (2003) 119–155126
Quoting from Breazeal 2003 15
valence prototype, Ppositive; maps to a content expression. The negative valenceprototype, Pnegative; resembles an unhappy expression. The closed stance prototype,Pclosed; resembles a stern expression, and the open stance prototype, Popen; resemblesan accepting expression.
The three affect dimensions also map to affective postures. There are six prototypepostures defined, which span the space. High arousal corresponds to an erect posturewith a slight upward chin. Low arousal corresponds to a slouching posture where theneck lean and head tilt are lowered. The posture remains neutral over the valencedimension. An open stance corresponds to a forward lean movement, which suggestsstrong interest toward the stimuli the robot is leaning toward. A closed stancecorresponds to withdraw, reminiscent of shrinking away from whatever the robot islooking at.
The remaining three facial prototypes are used to strongly distinguish theexpressions for disgust, anger, and fear. Recall that four of the six primary emotionsare characterized by negative valence. Whereas the primary six basis postures(presented above) can generate a range of negative expressions from distress tosorrow, the expressions for intense anger (rage), intense fear (terror), and intensedisgust have some uniquely distinguishing features. For instance, the prototype fordisgust, Pdisgust; is unique in its asymmetry (typical of this expression such as thecurling of one side of the lip). The prototypes for anger, Panger; and fear, Pfear; eachhave a distinct configuration for the lips (furious lips form a snarl, terrified lips forma grimace).
surp
unhap
tired
anger
fearLow
arousal
Higharousal
Negativevalence
Positivevalence
Closed stance
Open stance
disgu
accepting
stern
ntent
Fig. 7. This diagram illustrates where the basis postures are located in affect space.
C. Breazeal / Int. J. Human-Computer Studies 59 (2003) 119–155 141
Quoting from Breazeal 2003 16
9/19/13
5
A range of sample expressions generated with this technique is shown in Fig. 8,although the system can generate a much broader range. The procedure runs in realtime, which is critical for social interaction.
Given this 3-D affect space, this approach resonates well with the work of Smithand Scott (1997). They posit a 3-D space of pleasure–displeasure (maps to valencehere), attentional activity (maps to arousal here), and personal agency, control(roughly maps to stance here). Table 1 summarizes their proposed mapping of facialactions to these dimensions. They posit a fourth dimension that relates to theintensity of the expression. For Kismet, the expressions become more intense as theaffect state moves to more extreme values in the affect space. As positive valenceincreases, Kismet’s lips turn upward, the mouth opens, and the eyebrows relax.However, as valence decreases, the brows furrow, the jaw closes, and the lips turndownward. Along the arousal dimension, the ears perk, the eyes widen, and themouth opens as arousal increases. Along the stance dimension, increasing positivevalues cause the eyebrows to arc outwards, the mouth to open, the ears to open, andthe eyes to widen. These face actions roughly correspond to a decrease in personalagency/control in Smith and Scott’s framework. For Kismet, it engenders anexpression that looks more eager and accepting (or more uncertain for negativeemotions). Although Kismet’s dimensions do not map exactly to those hypothesizedby Smith and Scott (1997), the idea of combining meaningful face action units in aprincipled manner to span the space of facial expressions, and to also relate them in aconsistent way to emotion categories, holds strong.
Anger CalmCa
FearFe CoContent
Disgust
Interest
Sorrorow S iSurprise TirTi eredd
Fig. 8. Kismet is capable of generating a continuous range of expressions of various intensities byblending the basis facial postures. Facial movements correspond to affect dimensions in a principled way.A sample is shown here.
C. Breazeal / Int. J. Human-Computer Studies 59 (2003) 119–155 143
Quoting from Breazeal 2003 17
Emo4onal expression evalua4on
motors. The wide eyes, elevated brows, and elevated ears suggest high arousal. Thismay account for the confusion with ‘‘surprise.’’
The still image studies were useful in understanding how people read Kismet’sfacial expressions, but it says very little about expressive posturing. Humans andanimals not only express with their face, but also with their entire body. To explorethis issue for Kismet, we showed a small group of subjects a set of video clips.
There were seven people who filled out a second questionnaire. Six were childrenof age 12, four boys and two girls. One was an adult female. In each clip Kismetperforms a coordinated expression using face and body posture. There were sevenvideos for the expressions of anger, disgust, fear, joy, interest, sorrow, and surprise.Using a forced-choice paradigm, for each video the subject was asked to select aword that best described the robot’s expression (anger, disgust, fear, joy, interest,sorrow, or surprise). On a ten-point scale, the subjects were also asked to rate theintensity of the robot’s expression and the certainty of their answer. They were alsoasked to write down any comments they had. The results are compiled in Table 4.Random chance is 14%.
The subjects performed significantly above chance, with overall strongerrecognition performance than on the still images alone. The video segments of‘‘anger,’’ ‘‘disgust,’’ ‘‘fear,’’ and ‘‘sorrow’’ were correctly classified with a higherpercentage than the still images. However, there were substantially fewer subjectswho participated in the video evaluation than the still image evaluation. Therecognition of ‘‘joy’’ most likely dipped from the still-image counterpart because itwas sometimes confused with the expression of interest in the video study. Theperked ears, attentive eyes, and smile give the robot a sense of expectation that couldbe interpreted as interest.
Misclassifications are strongly correlated with expressions having similar facial orpostural components. ‘‘Surprise’’ was sometimes confused for ‘‘fear;’’ both have aquick withdraw postural shift (the fearful withdraw is more of a cowering movementwhereas the surprise posture has more of an erect quality) with wide eyes andelevated ears. ‘‘Surprise’’ was sometimes confused with ‘‘interest’’ as well. Both havean alert and attentive quality, but interest is an approaching movement whereassurprise is more of a startled movement. ‘‘Sorrow’’ was sometimes confused with
Table 4This table summarizes the results of the video evaluation
Anger Disgust Fear Joy Interest Sorrow Surprise % Correct
Anger 86 0 0 14 0 0 0 86Disgust 0 86 0 0 0 14 0 86Fear 0 0 86 0 0 0 14 86Joy 0 0 0 57 28 0 15 57Interest 0 0 0 0 71 0 29 71Sorrow 14 0 0 0 0 86 0 86Surprise 0 0 29 0 0 0 71 71
Forced-choice percentage (random=14%).
C. Breazeal / Int. J. Human-Computer Studies 59 (2003) 119–155 145
Quoting from Breazeal 2003 18
(Human) emo4on input
Kismet recognized four affec4ve intents (i.e. praise, prohibi4on, a^en4onal bids, and soothing) from a person’s vocal prosody. – A recognizer was designed to categorize these.
When interfaced with Kismet’s emo4on system, the person could manipulate the robot’s affec4ve state through tone of voice, causing the robot to become – more posi4ve through praising tones,
– more aroused through aler4ng tones,
– more ‘‘sad’’ through scolding tones, and – moderately aroused through soothing tones.
Quoting from Breazeal 2003 19
Lessons from Kismet: Timing
Note the ability to engage people in – face-‐to-‐face, – rich, – dynamic,
– mutually regulated, and
– closely coupled affec4ve interac4ons. engaging because …
– expressive behavior is 4mely and … synchronized with the human’s behavior ([4mings] less than a second). … a natural flow and rhythm to … interac4on … s4mula4ng for the robot, … compelling for the person.
Quoting from Breazeal 2003 20
9/19/13
6
Lessons from Kismet: HRI is not just HCI
Robots not only have to carry out their tasks, they also have to survive in the human environment.
The ability of robots to adapt and learn in such an environment is fundamental.
For robots, social and emo4ve quali4es serve not only – to ‘‘lubricate’’ the interface between humans and robots, but also
– to play a pragma9c role in promo9ng • survival, • self maintenance,
• learning, • decision-‐making,
• a=en9on, and more
[One way to capture this is to say that social quali4es are not just “for the interface”.]
Quoting from Breazeal 2003 21
Lessons from Kismet: Emo4ons as (HCI) affordances?
[W]hen designing robots that interact with humans in the real world, the issue is not so much – whether robots should have social and emo4ve characteris4cs,
– but of what kind. … The robot’s observable behavior and the manner in which
it responds and reacts to people profoundly shapes the interac4on and the mental model people have for the robot
… it is important that the robot not only does the right thing, but also at the right 4me and in the right manner
Quoting from Breazeal 2003 22