Toward computers that recognize and respond to user...

by R. W. Picard

For a long time emotions have been kept out ofthe deliberate tools of science; scientists haveexpressed emotion, but no tools could sense andrespond to their affective information. This paperhighlights research at the MIT Media Laboratoryaimed at giving computers the ability tocomfortably sense, recognize, and respond to thehuman communication of emotion, especiallyaffective states such as frustration, confusion,interest, distress, anger, and joy. Two mainthemes of sensing—self-report and concurrentexpression—are described, together withexamples of systems that give users new ways tocommunicate emotion to computers and,through computers, to other people. In additionto building systems that try to elicit and detectfrustration, our research group has built asystem that responds to user frustration in a waythat appears to help alleviate it. This paperhighlights applications of this research tointerface design, wearable computing,entertainment, and education, and brieflypresents some potential ethical concerns andhow they might be addressed.

Not all computers need to “pay attention” toemotions or to have the capability to emulate

emotion. Some machines are useful as rigid tools,and it is fine to keep them that way. However, thereare situations in which human-computer interactioncould be improved by having the computer adapt tothe user, and in which communication about when,where, how, and how important it is to adapt involvesthe use of emotional information.

Findings of Reeves and Nass at Stanford Universitysuggest that the interaction between human and ma-chine is largely natural and social,1 indicating that

factors important in human-human interaction arealso important in human-computer interaction. In hu-man-human interaction, it has recently been arguedthat skills of so-called “emotional intelligence” aremore important than are traditional mathematicaland verbal skills of intelligence.2 These skills includethe ability to recognize the emotions of another andto respond appropriately to these emotions. Whetheror not these particular skills are more important thancertain other skills will depend on the situation andgoals of the user, but what is clear is that these skillsare important in human-human interaction, andwhen they are missing, interaction is more likely tobe perceived as frustrating and not very intelligent.

Consider an example of human-human interaction:

Suppose that a human starts to give you help ata bad time. You try ignoring, then frowning at,and then maybe glaring at him or her. The savvyhuman infers you do not like what just happened,ceases the interruption, notes the context, andlearns from the feedback.

This scenario is easily modified for human-computerinteraction and is relevant to the growing use of soft-ware that is designed to interrupt a person with re-minders, hints, and offers of suggestions. Now wetake out the word “human” in the example and re-place it with the word “computer”:

rCopyright 2000 by International Business Machines Corpora-tion. Copying in printed form for private use is permitted with-out payment of royalty provided that (1) each reproduction is donewithout alteration and (2) the Journal reference and IBM copy-right notice are included on the first page. The title and abstract,but no other portions, of this paper may be copied or distributedroyalty free without further permission by computer-based andother information-service systems. Permission to republish anyother portion of this paper must be obtained from the Editor.

IBM SYSTEMS JOURNAL, VOL 39, NOS 3&4, 2000 0018-8670/00/$5.00 © 2000 IBM PICARD 705

Toward computersthat recognizeand respondto user emotion

Suppose that a computer starts to give you helpat a bad time. You try ignoring, then frowning at,and then maybe glaring at it. The savvy computerinfers you do not like what just happened, ceasesthe interruption, notes the context, and learns fromthe feedback.

Of course, the “savvy computer” does not exist yet.Today’s computers largely ignore the user’s signs offrustration. This situation is similar to having a per-son ignore your signs of growing irritation, whichwould likely lead you to think of that person as an-noying and rather stupid—unless, perhaps, you werehighly intimidated by the person, in which case youmight blame your own actions. An analogy with hu-man-computer interaction also holds, where peoplethink of computers as annoying and stupid, or as in-timidating, and of themselves as dummies. The sim-ilarity in the interaction holds despite the fact thatpeople know that computers are not humans and thatcomputers do not have the same capabilities as hu-mans. We all know better than to treat computerslike humans, and yet many of our default behaviorsand responses tend in this direction.

Giving computers skills of emotional intelligence inthe broad sense originally described by Salovey andMayer3 involves more than giving them the abilityto recognize and respond to human emotions. It in-volves other aspects of affective computing—com-puting that relates to, arises from, and deliberatelyinfluences emotion—as well as other nonaffective ca-pabilities, such as sensing of and reasoning about acontext. A fairly extensive overview of research re-lated to affective computing, up to and including early1997, can be found in the author’s book.4 This pa-per does not attempt to perform an overview of thefield since publication of the book but rather focuseson an overview of affect communication research atthe Massachusetts Institute of Technology (MIT) Me-dia Laboratory since 1997. The reader who is inter-ested in learning more about the latest affect-relatedwork at other locations might turn to the numerousrecent workshops and special sessions that addressa broad variety of aspects of emotion and compu-tation related to artificial intelligence, softwareagents, and human-computer interfaces; several ofthese have published or soon-to-be published pro-ceedings.5–10 Also, the references given at the endof this paper contain many citations to related workdone outside the MIT Media Lab.

Although the work of this paper focuses on recog-nition and response to user emotion, there are other

aspects of affective computing, such as the use ofemotion-like mechanisms within the machine to helpmake decisions, adjust perception, and influencelearning. Even though the author is also involved insome of that research going on outside the MediaLab at MIT,4,11,12 discussion of those areas of affec-tive computing will not be included in this paper.Here, the focus is on the area of giving machines newabilities to recognize and respond to emotion, with-out actually giving them emotion-like mechanisms.

In the Affective Computing Research Group at MIT,we are particularly interested in the intelligent han-dling of affective states commonly expressed aroundcomputer systems: frustration, confusion, dislike,like, interest, boredom, fear, distress, and joy. Theidea is that if the computer detects expressions ofthese states, and associates them with what it is do-ing and with other events in the environment, thenit can better learn which interactions are successfuland which need improvement. In other words, thecomputer can directly collect usability informationrelated to a user’s emotion, such as what parts of theinterface or system were in use when the user ex-pressed the greatest frustration. This information canbe passed on to a designer, who can try to use it toimprove future versions of the system. Alternatively,if the system itself is “smart” and entrusted with itsown adaptation, then it can use the affective feed-back to try to improve its behavior. The latter is risk-ier, in the sense that few systems are capable yet ofdoing this in a way that would not simply cause theuser’s frustration to escalate. Successful adaptationto users is an important research area for machinelearning and human-computer interaction, and I ex-pect to see future systems move toward the latteraim as well as the former.

The rest of this paper is divided into three subjectareas: (1) enabling the user to communicate emo-tion in a way that is physically and psychologicallycomfortable, generally by offering several possiblestyles of direct and indirect sensing and new toolsfor communication; (2) reducing user frustration,with a focus on how the computer can handle userfrustration once it has occurred; and (3) developingapplications that handle affective information, withexamples from several domains.

Comfortable communication of emotion

Emotion communication requires that a message beboth sent and received. Most computer interfacesinhibit such communication. Several people with au-

PICARD IBM SYSTEMS JOURNAL, VOL 39, NOS 3&4, 2000706

tism, a complex disorder that typically includes im-pairment in recognition of emotion, have com-mented that they love being on the Web because itlevels the playing field for them. In a sense, every-one is autistic on line. With the exception of giftedpoets and others who work hard to lessen the ambi-guity of the emotions expressed by their text e-mails,most of the emotions we show to our keyboards, mon-itors, and mice are not transmitted. Sometimes this isgood; however, often it is a source of miscommu-nication and misunderstanding, resulting in lost time,damaged relationships, and reduced productivity.

Our research group is building tools to facilitate mul-tiple kinds of emotion expression by people, not toforce this on anyone, but to allow for a larger spaceof possibilities for those who want to communicateaffective information. The tools include new hard-ware and software that we have developed to enablecertain machines not only to receive emotionalexpression, but also to recognize meaningful patternsof emotional expression.

Emotion communication can be conducted withvarying degrees of naturalness and accuracy, and withsome surprises about what may be considered mostpreferable. I do not advocate any one means of com-munication, but believe in providing an array ofchoices so that users who wish to communicate emo-tion can pick what they prefer. Two main themes ofcommunication run through the kinds of devices wehave built at the Media Lab: self-report and concur-rent expression. These two themes are described next,together with an example of a human-human anal-ogy, some pros and cons, and examples of new sys-tems that our group has designed and built.

Self-report. Self-report systems leave it up to the userto go out of his or her way to communicate emotion.The user might make a selection by means of soft-ware from a menu of emotions with words or icons,or the user might touch a hardware input device thatacts as a tangible icon, e.g., a physical icon, or “phi-con” in the language of Ishii and Ullmer.13 In eithercase, it is up to the user to select an item when heor she feels a certain way and wants the system toknow. (If natural language processing were suffi-ciently advanced, the system could accept freelytyped or spoken input instead.) Following are char-acteristics of self-report systems:

● A human-human analogy of these systems: Oneperson interrupts a conversation or task to clearlystate how he or she is feeling.

● The pros of these systems: This option gives theuser total conscious control over the message thatis sent, and sometimes it is a natural way to ex-press emotion.

● The cons of these systems: The user has to stopthe primary task he or she is doing in order to com-municate emotion. It is often hard or burdensometo articulate one’s feelings. The menu of words,icons, or phicons can become large and still notcapture what one is feeling.

An example of self-report—thumbs-up feedback. Toprovide ongoing usability feedback, Matt Norwoodin the MIT Media Lab has built a thumbs-up and athumbs-down phicon and attached both to the net-worked “Mr Java” coffee machine in our lab (Fig-ure 1). This machine usually works fine, making goodcappuccino and espresso, but it occasionally displayscryptic messages. Norwood has added the ability toassociate user feedback with the status of the ma-chine’s functions, including error states, such as “outof beans” and “out of milk.” When a customer is sat-isfied, he or she can tap the thumbs-up phicon, andthe system will record a notch of satisfaction for thecurrent operating state. If the machine displays anerror message that the customer does not under-stand, such as “empty grounds bin,” the customermight tap the thumbs-down phicon. The system re-cords the continuous stream of reports of satisfac-tion or dissatisfaction, associating each with one ofthe 31 operating states of the machine at the timeof feedback. This information provides the designerof the machine with a usability report of which fea-tures pleased or displeased the most customers,based on when users chose to express such informa-tion. Details of our findings with this device can befound in Norwood’s thesis.14

Concurrent expression. In concurrent expression,the system attempts to sense affective expression inparallel with the user’s primary task, without the userhaving to stop what he or she is doing to report hisor her feelings. This can happen via whichever sen-sors the user may choose for the computer to have:video, microphone, typing or mouse holding pres-sure, physiology, olfaction, and so forth. Character-istics of concurrent expression follow:

● A human-human analogy: A person expressesemotion in whatever way is most natural for thegiven situation, while engaged in some task, with-out having to stop that task.

● The pros: This method aims at the greatest nat-uralness, placing no additional burden on the user.

IBM SYSTEMS JOURNAL, VOL 39, NOS 3&4, 2000 PICARD 707

The user does not have to be interrupted, nor doeshe or she have to put into words something thatmight be difficult to express verbally.

● The cons: Users may feel uneasy about a machinesensing things that they themselves are unawareof having expressed. There is more room for mis-interpretation, given that much nonverbal infor-mation is ambiguously communicated. Many formsof sensing may be considered obtrusive, especiallyfrom a privacy standpoint.

One example of a concurrent-expression method—expression glasses. In many cases, such as a humanfactors assessment of users engaging a complex prod-uct, a direct means of assessment is fine after the in-teraction, but not during it, when it could interferewith the person’s task. Consider a focus group inwhich participants are asked to indicate the clarityof packaging labels. If while reading line three, a sub-ject furrows his brow in confusion, he has commu-nicated concurrently with the task at hand. Brow fur-rowing can be detected by a camera if lighting andhead position are carefully restricted (otherwise cur-rent computer vision techniques are inadequate), but

these restrictions, coupled with the recording of iden-tity, can make some subjects uncomfortable.

An alternative is a pair of wearable “expressionglasses” (Figure 2) that sense changes in facial mus-cles, such as furrowing the brow in confusion or in-terest.15 These glasses have a small point of contactwith the brow, but otherwise can be considered lessobtrusive than a camera in that the glasses offer pri-vacy, robustness to lighting changes, and the abilityto move around freely without having to stay in afixed position relative to a camera. The expressionglasses can be used concurrently, while concentrat-ing on a task, and can be activated either uncon-sciously or consciously. People are still free to makefalse expressions, or to have a “poker face” to masktrue confusion if they do not want to communicatetheir true feelings (which I think is good), but if theywant to communicate, the glasses offer a virtually ef-fortless way to do so. Details on the design of theglasses and on experiments conducted with theexpression glasses are available in the paper byScheirer et al.15

Figure 1 At left: the customer taps the thumbs-up icon after getting a frothy cup of cappuccino; at right: a bar graphillustrating positive feedback (green) and negative feedback (red) received as a function of system operating state

NO ERROR OUT OFBEANS

EMPTYGROUNDS BIN

PANELOPEN

THUMBS UP

COMPLAINT

0

10

20

30

40

50

60

70

80

90

100

Photo by Webb Chappell, copyright 1999 Webb Chappell; reprinted with permission


Note that the two main forms of communication justdescribed, self-report and concurrent expression, canbe used together. For example, the standard self-re-port means of giving a subject a questionnaire mightbe useful after a scenario like the one just described.The two methods complement each other in that theycan gather different information. For example, self-report is notoriously inaccurate for getting true feel-ings (affected by a subject’s expectations, mood, com-fort level, rapport with the experimenter, and soforth) but can still provide useful feedback by help-ing articulate causes of feelings and other importantissues.

Why wear expression glasses, instead of raising yourhand or pushing a button to say you are interestedor confused? The answer is not that there should beone or the other; both have a distinct role. Some-times the subtlety of this point is missed: affect is con-tinuously communicated while you are doing justabout anything. When you pick up a pen, you willdo so very differently when you are angry than whenyou are joyful. When you watch somebody, your eyeswill behave differently if you are interested than ifyou are bored. As you listen to a conversation or alecture, your expression gives the speaker feedback,unless, of course, you put on a poker face. In con-trast with these concurrent expressions, if you wantto push a button to communicate your feelings, orif you want to raise your hand to say you are con-fused, you have to interrupt what you were doingbeforehand.

In studies conducted decades ago, participants weregiven self-report buttons that they could use in aclassroom or meeting to anonymously communicatetheir feelings and opinions. The users of these sys-

tems describe the technology as helping to engagethem more in the interaction, believing that theiropinions mattered more.16 This method was success-ful when used in a question-and-answer format, e.g.,“Did you all like this?” with everyone then pressing“yes” or “no.” However, when the device was usedto communicate feedback such as confusion to a pro-fessor in a classroom, there were several problems.The key problem was that while a person was con-centrating on what was confusing and trying to un-derstand it, he or she would forget to push the but-ton. Self-report is useful when there is a break in theaction, with time to ask about and assess what hashappened. But at that point, it takes more work torelate the expressed confusion to precisely where itwas triggered. In contrast, furrowing the brow hap-pens without necessarily interrupting the flow of ac-tion; the listener can change his or her facial expres-sion without having to think about doing so; a personcan concentrate on the lecture instead. Self-reportis important, but it is no substitute for the naturalchannels of largely nonverbal communication thathumans use concurrently while engaged in conver-sation, learning, and many other activities.

Another example of a concurrent-expression method—physiological analysis. Our lab has been active in de-veloping new interfaces for future computing, includ-ing wearable computers that would offer a highlypersonalized computing environment, which attendsnot just to a person’s direct input but also to the per-son’s behavioral and affective cues.17 Toward thisgoal, we have been developing new wearable sen-sors that attempt to be more comfortable physicallyand socially (look more like fashion accessories thanlike clunky medical instrumentation), facilitating thegathering of affective information concurrent with

Figure 2 Expression glasses (left) shown with “stealth” visor for privacy; Jocelyn Scheirer (center and right) wearingsensors that detect an expression of confusion, center (red), and of interest, right (green)


day-to-day activities.18 A wearable computer affordsa different kind of sensing; in particular, it is rela-tively easy to obtain signals from the surface of theskin, in contrast with a seated environment whereit is easier to point a camera at the user. We havebeen developing algorithms that can read four phys-iological signals, comfortably sensed from the skinsurface, and relate these to a deliberately expressed

emotion. Our recent results have achieved 81 per-cent recognition accuracy in selecting which of eightemotions was expressed by an actor, given 30 daysof data, eight emotions per day, and features of thefour signals: respiration, blood volume pressure, skinconductivity, and muscle tension. (See Vyzas andPicard19 for details of the data collection and the al-gorithms we developed for recognition.) The eightemotions investigated were: neutral, hatred, anger,romantic love, platonic love, joy, and reverence.These results are the best reported to date for emo-tion recognition from physiology, and they lie be-tween machine recognition results of affect fromspeech and of affect from facial expressions.

It should be noted that our results are only for a sin-gle user, and they are obtained by a forced selectionof one of the eight categories. Hence, these resultsare comparable to the recognition results in the earlydays of speech recognition, when the system was re-trained for each speaker, and it knew that the per-son was speaking one of eight words, although therecould be variation in how the person spoke the wordsfrom day to day. Much more work remains to be doneto understand individual differences as well as dif-ferences that depend on context—whether develop-mental, social, or cultural. I expect that, like researchin speech recognition, this work will gradually ex-pand to be able to handle speakers (or their affec-tive expression) from different cultures, of differentages, speaking (or expressing) continuously, in a va-riety of different environments.

Physiological expression is one of many multimodalmeans of concurrent emotion communication thatour group is exploring. We are also beginning to an-alyze affect in speech, an area in which humans onlyperform at about 60 percent accuracy (on roughlyeight emotion categories, when the content of thespeech is obscured). Our initial focus is on speechfrom automobile drivers who might be conversingon the phone or with an automobile navigation sys-tem. We have recently equipped a car to examinedriver behavior features jointly with physiological in-formation.20 One such sensor setup is shown in Fig-ure 3 by which the following physiological indicatorswere measured: muscle tension with an electromyo-gram (EMG), blood volume pressure (BVP), skin con-ductivity (SC), and respiration. All these efforts gatherdata that push the abilities of traditional pattern rec-ognition and signal processing algorithms, whichhave difficulty handling the day-to-day and interper-sonal variations of emotional expression. Conse-quently, we are also conducting basic research in ma-chine learning theory and in pattern recognition todevelop better methods for modeling and inference.I expect that affective systems will use affect as inputto a machine learning system, which will not only tryto learn to do better at recognizing affect but willalso learn to do better at adapting to the user, basedon his or her positive or negative feedback.

It is important to keep in mind that some people willfeel more or less comfortable with each of theseforms of communication. In a survey of nine regularusers of traditional computing involving a big soft-ware package at MIT, with whom we were workingto find ways to gather affective feedback about theproduct, five of them wanted a self-report button,whereas the other four wanted a beanbag, specialsurface, or mouse pressure sensor that they couldhit, squeeze, or bang on. Such input, if mapped toan icon on the screen that registers disapproval, couldbe viewed as a nonverbal means of self-report, whichthe user was in control of at all times. If the mousewere the primary source of such hitting, or squeez-ing, or banging input, then the feedback could alsooccur concurrently with the task, giving the user an-other option. We thus built a mouse that senses pres-sure patterns but is only activated when the user di-rectly points the mouse at a feedback icon (Figure4). This mouse is part of new work by Carson Rey-nolds, who is developing an “Interface Tailor” thatwould adapt to user displays of positive or negativefeedback. The mouse is a natural place for sensingsignals from the user’s hand; Dana Kirsch in our lab


Physiological expression is oneof many multimodal means

of concurrentemotion communication

we are exploring.

built an earlier version for trying to sense positiveor negative information,21 and IBM has built a ver-sion for sensing six emotions.22

It was interesting to discover that in our survey ofthe nine users, video sensing was not selected by anyof them, and only one person selected microphonemonitoring of vocal stress. Although facial and vo-cal expressions are the best-known social forms ofhuman emotional expression, and new forms of sens-ing are beginning to offer more natural emotion com-munication, many people are not ready for this typeof sensing. Users have strong feelings about if, when,where, and how they want to communicate theiremotions, as well as about the privacy of their iden-tifying and appearance information (face and voice).It would be absurd if developers of affective com-puting, which is ultimately about respecting humanfeelings, did not respect people’s feelings in the de-ployment of this technology. Our group’s ethic in-sists on giving users choices and control.

The above emphasis was on human-computer affectcommunication. Most of this also applies to human-human communication when mediated by a com-puter, although in the case of the latter it is not nec-

Figure 3 Sensor placement in experiments to learn about driver stress and other affective states that might influence driving safety

CAMERA

GASBRAKE

SC

BVP

COMPUTER

EMGRESPIRATION

Figure 4 Mouse equipped with resistive foam that senses pressure patterns to discern everyday gentlehandling from pounding and other actions

Photo by Webb Chappell, copyright 1999 Webb Chappell; reprinted with permission


essarily desirable that the machine interpret orrecognize the emotional signals, only that the ma-chine facilitate the priority and accuracy of theirtransmission. An example of a system designed ex-plicitly to expand human-human communication ca-pabilities via computer is the TouchPhone, devel-oped by Jocelyn Scheirer in our group (Figure 5).The TouchPhone augments regular voice commu-nication with pressure information, indicating howtightly the speaker is holding the phone. The pres-sure is mapped to a color seen by the person on theother side—calibrated to blue if light pressure is ap-plied and to red if strong pressure. No interpreta-tion of this signal is performed by the computer; thecolor signal is simply transmitted to the conversa-tional partner as an additional channel of informa-tion.

I have met with four of my students over four hoursof TouchPhone conversations, and the results, al-though anecdotal, were interesting and were consis-tent with other experiences we have had with thisand other emotion-communication technologies wehave been developing. I found that each of the fourstudents had a nearly unique color flicker pattern,which was distracting at first. After I moved the colorpattern into my periphery, it became an ambientbackground pattern that was no longer distracting,adding a flavor of background rhythm to the con-versation. For one student, the pattern changed veryslowly, becoming stable red when I started askingsome research questions. I thought nothing of it, be-cause he could have simply been squeezing the phonemore tightly by shifting his position. However, eventhough he knew that I could not interpret his feel-

ings from the color, he informed me that he was nottrying to squeeze it tighter at all, and he thought itwas red because he was stressed about a question Iasked him. This person was a very nonexpressive,down-to-earth engineering student who had neverrevealed such signs of stress to me in the years ofconversations we had had prior to this TouchPhoneconversation. The technology thus facilitated hisopening up a greater range of emotional commu-nication, by his choice; it did not impose this on him,but it made it easier for him. The color was not anexpression of how he or any of the students was trulyfeeling. However, the system provided a new chan-nel of nonverbal communication that could, in turn,open up a new line of verbal communication. TheTouchPhone is a new vehicle for carrying informa-tion that may change with emotion, but it will notreveal a person’s emotion.

Reducing user frustration

Not only do many people feel frustration and dis-tress with technology, but they also show it. A widelypublicized 1999 study by Concord Communicationsin the United States found that 84 percent of help-desk managers surveyed said that users admitted toengaging in “violent and abusive” behavior towardcomputers. A survey by Mori of 1250 people whowork with computers in the U.K. reported that fourout of five of them have seen colleagues hurling abuseat their PCs, and a quarter of users under age 25 ad-mitted to having kicked their computer.

It seems that no matter how hard researchers workon perfecting the machine and interface design, frus-

Figure 5 Use of the TouchPhone, illustrated by Scheirer; transmitted color ranges from blue, through green, yellow, orange, and finally red, as the pressure applied to the phone increases


tration can occur in the interaction. Researchers ofhuman-computer interaction have worked hard toprevent frustration, which continues to be an impor-tant goal. However, despite their best efforts, an un-foreseen situation often creates stress in the inter-action. Even if computers were as smart as humans,they would still sometimes behave in a frustratingway, because even the most intelligent humans some-times behave in a frustrating way. Hence, there is aneed to address frustration at run time—detectingit, and responding to it.

I originally thought it would be easy to collect datafrom frustrated users. One could just ask them to sitin front of a computer running a particular softwarepackage, and—voila! Alternatively, one could hirean actor to express emotions, and record them. Ifthe actor used method acting or another techniqueto try to self-induce true emotional feelings, the re-sults would closely approximate emotions that arisein natural situations. A student using visualization tech-niques for feeling different emotions was the methodwe used in collecting 30 days of physiology data.19,23

However, these examples are not as straightforwardas they may seem at first: they are complicated byissues such as the artificiality of bringing people intolaboratory settings, the mood and skill of an actor,whether or not an audience is present, the expec-tations of subjects who think you are trying to frus-trate them, the unreliability of a given stimulus forinducing emotion, the fact that some emotions canbe induced simply by a subject’s thoughts (over whichexperimenters have little or no control), and thesheer difficulty of accurately sensing, synchronizing,and understanding the “ground truth” of emotionaldata. In response to these difficulties, we have be-gun to develop lab-based experimental methodol-ogies for eliciting and gathering frustration data.

Building a system. In one methodology that we de-veloped, we built a system that attempts to elicit andrecord signs of frustration from a user in a carefullycontrolled way using concurrent expression.24 Thissystem involves users in an experiment whereby theytry to solve a sequence of easy, visual perception puz-zles as accurately and as quickly as possible in orderto win a $100 prize. During the task, we introducerandomly appearing delays, where the mouse seemsto stick and the screen does not advance to the nextpuzzle, thus damaging their score. Such an event islikely, although not guaranteed, to elicit frustration.

We compared features from two physiological sig-nals: blood volume pressure and skin conductivity,

during episodes when the mouse “stuck” vs episodeswhen all was going smoothly. Training hiddenMarkov models for each of these two categories, foreach user from part of the data, Raul Fernandez builtan algorithm that tried to infer which category a per-son was in, given only the physiological features fromunseen data. Initial results were significantly betterthan a random choice at detecting and recognizingthe potential frustration episodes in 21 out of 24users.25 However, the results were still significantlyless than 100 percent, indicating that although thisinformation is helpful, it must be combined withother signals for a more confident decision about theaffective state.

Suppose that a computer could detect frustrationwith high confidence, or that a person directly re-ports frustration to the machine so that some kindof response is appropriate. How should the computerrespond? Returning to the Reeves and Nass argu-ments, I believe it is important to explore how a suc-cessful human would respond, and see whether wecan find a machine-appropriate way to imitate thisresponse. “It looks like things did not go very well,”and “We apologize to you for this inconvenience”are examples of statements that humans use in help-ing one another manage frustration once it has oc-curred. Such kinds of statements are known to helpalleviate strong negative emotions such as frustra-tion or rage. But can a computer, which does nothave feelings of caring, use such techniques effec-tively to help a user who is having a hard time? Toinvestigate, we built an agent that practices someskills of active listening, empathy, and sympathy, ac-cording to the following strategy, which has been de-signed based on other researchers’ analysis of suc-cessful human-human interaction:

Research goal: Reduce user frustration once it hasoccurred.

Strategy:

1. Recognize (with high probability) that the situ-ation may be frustrating, or that the user is show-ing signs of frustration likely due to the system.

2. Is the user willing to talk? If so, then

● Practice active listening, with empathy and sym-pathy, e.g., “Good to hear it wasn’t terribly frus-trating,” “Sorry to hear your experience wasn’tbetter,” “It sounds like you felt fairly frustratedplaying this game. Is that about right?”


● Allow for repair, in case the computer has “mis-understood.”

● In extreme cases, the computer may even apol-ogize, e.g., “This computer apologizes to youfor its part in . . . ”

3. Provide polite social closure.

In developing this system, we took care to avoid lan-guage in which the computer might refer to itself as“I” or otherwise give any misleading implications ofhaving a “self.” The system assesses frustration andinteracts with the user through a text dialog box (withno face, voice, fancy animation, or other devices thatmight provoke anthropomorphism). The only aspectof the interaction that evokes the image of anotherperson is the use of the English language, which al-though cleansed of references to self, nonethelesswas made deliberately friendly in tone across all con-trol and test conditions, so that “friendliness” wouldnot be a factor in this study.

Using the system. The emotion support agent wastested with 70 users who experienced various levelsof frustration upon interacting with a simulated net-work game.26 We wanted to measure a strong be-havioral indication of frustration, since self-reportis notoriously unreliable. Thus we constructed a sit-

uation where people were encouraged to do theirbest while test-playing an easy and boring game, bothto display their intelligence and to win one of twomonetary prizes. Half of the subjects were exposedto an especially frustrating situation while they played(simulated network delays, which caused the gameto freeze, thereby thwarting their attempt to displaytheir intelligence or win a prize). Afterward, subjectswould interact with the agent, which would attemptto help them reduce their frustration. Finally, theywould have to return to the source of their frustra-tion and engage again with the game, at which pointwe measured how long they continued to interactwith it. Our prediction was based on human-humaninteraction: if someone frustrates you and you arestill highly frustrated when you have to go back andinteract with them, then you will minimize that in-teraction; however, if you are no longer feeling frus-trated, you are likely to remain with them longer.The 2 3 3 experimental design is shown in Figure6, where 34 users played the game in a low-frustra-tion condition, and 36 played the same game withsimulated delays.

We ran three cases for each of the low- and high-frustration conditions. The first two cases were con-trols, both of which were text-based, friendly inter-actions having essentially the same length as theemotion-support agent: the first (ignore) just asked

Figure 6 The 2 x 3 experimental design

GAME 1:

PLAY WEB GAMEFOR5 MINUTES

GAME 2:

PLAY GAMEAGAIN FORAT LEAST3 MINUTES EXIT

SURVEY

PLAY WEB GAME(WITH WEBDELAYS)FOR5 MINUTES

12

11

11

12

QUESTION SEQUENCE 1:NO EMOTIONS DISCUSSED

QUESTION SEQUENCE 1:NO EMOTIONS DISCUSSED

QUESTION SEQUENCE 2:INVITED TO VENT EMOTIONS

QUESTION SEQUENCE 2:INVITED TO VENT EMOTIONS

QUESTION SEQUENCE 3:EMOTION-SUPPORT AGENT

QUESTION SEQUENCE 3:EMOTION-SUPPORT AGENT

12

12


about the game, ignoring emotions, and the second(vent) asked about the game, but then asked ques-tions about the person’s emotional state and gavehim or her room to vent (with no active listening,empathy, or sympathy). After interacting with oneof the three cases (ignore, vent, or emotion-support),each player was required to return to the game andto play for three minutes, after which the quit but-ton came on and they were free to quit or to playup to 20 minutes. Compared to people in the ignoreand vent control groups, subjects who interacted withthe emotion-support agent played significantlylonger, demonstrating behavior indicative of a de-crease in frustration. People in the ignore and ventcases both left much sooner, and there was no sig-nificant difference between their times of play.

These findings held true within the low-frustrationcondition and within the high-frustration condition:those who interacted with the emotion-support agentplayed significantly longer than those who interactedwith similar agents that ignored their emotions (ig-nore) or just asked them about their emotion (vent).We also analyzed the data to see whether there wereany significant effects with respect to gender, traitarousability, and prior game playing experience;none of these factors was significant. (For more de-tails regarding this system and its human experimentand findings, see Klein et al.26) These results sug-gest that today’s machines can begin to help reducefrustration, even when they are not yet smart enoughto identify or fix the cause of the frustration.

These findings were of no surprise to one colleague,visiting from a large computer firm. He describedhow his company had conducted a study of their cus-tomers to find which were most likely to buy theirbrand of computer again: those who had problemswith the computer and received good support, orthose who had no problems with the computer. Theresults showed that those who had problems and re-ceived good support were significantly more likelyto buy this company’s brand again than were cus-tomers who had no problems with their computers.Their findings underscore the need for good cus-tomer support, which usually implies teams that havebeen carefully trained in the importance of practic-ing active listening, and in the appropriate use of em-pathy and sympathy. Our findings show that theseskills, when practiced by a computer, can also havea strong behavioral effect.

Our findings in this area, and those findings of thelarge computer company, are disturbing, not only be-

cause of the increased tendency that companies haveto rush products to customers that are barely 80 per-cent debugged, but because of the potential for de-liberate manipulation of customers’ feelings. Couldcomputers manipulate customers’ feelings? Yes. Butwould the technique used by the agent succeed re-peatedly? Probably not. The rationale lies again inexamining the nearest human-human equivalent. Ifa customer service agent repeatedly showed activelistening, empathy, and sympathy for the source ofthe frustration, but never did anything to help solvethe problem, the customer would wise up quickly.In contrast, if the agent played the role of a friendwho supports the customer, but does not solve thecustomer’s problems, then that support might stillbe effective. I do not see the virtual friend technol-ogy as ready for commercial deployment anytimesoon, and it raises many other concerns, but it is afuture possibility, and affect communication wouldplay an important role in its development.

This technology raises potential ethical concerns,most of which are not new to this technology. Forexample, advertising and persuasive technologies(see special issue of Communications of the ACM, May1999) already focus on manipulating emotion, andmany other forms of surveillance and sensing tech-nology, coupled with networked databases, have re-searchers examining privacy implications. I expectthat this technology will succeed best when it is ap-plied only in the service of users’ needs, under theircontrol, and for their benefit. The interested readeris referred to the discussions in Klein’s thesis27 andin Chapter 4 of the author’s book4 for specific issuesrelated to sensitive handling of affective information.

Developing applications

Affective computing suggests many new applicationsand variations on existing applications, in entertain-ment, in the arts, in education, and in regular human-computer interaction. Many such possibilities havebeen envisaged in the author’s 1997 book (Chapters3 and 84), and this section will highlight some of theprogress we have made since 1997. In each case, itis important to keep in mind that what we are aim-ing for is not a one-size-fits-all system, but a systemthat can adapt to each user in a way that is sensitiveand respectful of his or her expressed feelings.

As mentioned above, we have designed and built af-fective wearable computers that sense informationfrom a wearer going about daily activities.18 Someof these wearables have been adapted to control de-


vices for the user, such as a camera that saves videobased on one’s arousal response,28 tagging the datanot just with the usual time stamp, but also with in-formation about whether or not it was exciting, asindicated by patterns in the user’s skin conductivity.Using similar information, our group has built awearable “DJ” that not only tries to select music fromperformers that are preferred, but also attempts toadjust its selection based on a feature of the user’smood.29 From prior listening, it can watch whichsongs the user prefers to listen to when calm or ac-tive, and which tend to calm or pep up the user more.Figure 7 shows the skin conductivity for one wearerwhile listening to seven songs and otherwise relax-ing. The level is noticeably higher for songs 2 and6, agreeing with the subject’s report that these songswere perceived as energizing. The level is noticeablylower for songs 3 and 7, agreeing with the subject’sreport that these were relatively calming. If the sub-ject chooses, he or she can tell the system such thingsas “choose some jazz to pep me up,” and it will se-lect music from preferences that have these features.

One thing to be clear about is that this system doesnot impose choices on the user; for example, it willnot try to “pep up” the user unless he or she has in-structed it (directly or indirectly) to do so. It remainsunder the user’s ultimate control, which I considerto be an important factor. The user can turn it onor off, and the user retains control over the types of

music it will play. Ideally, affective systems shouldnever impose something on their user unless the userindicates that he or she wants the imposition. If af-fective technology annoys the user, it has probablynot succeeded.

An application in the creative arts is the develop-ment of a highly expressive wearable system: theConductor’s Jacket. Figure 8 shows one of the earlyprototypes designed and built by my research assis-tant, Teresa Marrin. The jacket maps patterns ofmuscle tension and breathing into features that shapethe music. Seven electromyogram (EMG) and onerespiration sensor are included in the version shownhere. This wearable system was built first to mea-sure how professional and student conductors nat-urally communicate expressive information to an or-chestra. After analyzing real conducting data fromsix subjects, Marrin found about 30 significant ex-pressive features and hypothesized several principlesof communication via musical gestures.30 She hassubsequently developed a version of the jacket thatcan transform natural expressive gestures of thewearer into real-time expressive shaping of MIDI(Musical Instrument Digital Interface) perfor-mances.31 A professional conductor herself, Marrinhas performed in the jacket in several large venueswith great acclaim. She is currently looking at howthe jacket can be used to help educate student con-ductors, giving them more precise feedback on theirtiming, tension, and other important aspects of ex-pressive technique.

During the course of this research, we have realizedthat there are many parallels between autistics, whotend to have severely impaired social-emotionalskills, and computers, which do not “get it” when itcomes to emotion. Both also tend to have difficultygeneralizing, even though both can be fabulous atcertain pattern recognition tasks. Because many ofthe issues we face in giving computers skills of emo-tional intelligence are similar to those faced by ther-apists working with autistics, we have begun somecollaboration with these experts. Current interven-tion techniques for autistic children suggest thatmany of them can make progress recognizing andunderstanding the emotional expressions of peopleif given lots of examples to learn from and extensivetraining with these examples.

We have developed a system that is aimed at help-ing young autistic children learn to associate emo-tions with expressions and with situations. The sys-tem plays videos of both natural and animated

Figure 7 Skin conductivity for a person listening to seven songs; signal is used by the “Affective DJ” to aid in song selection

0 2 4 6 8 10 12 14 16 18 20 224.0

4.5

5.0

5.5

6.0

6.5

7.0

7.5

MIC

RO

SIE

ME

NS

MINUTES

SKIN CONDUCTIVITY

Son

g 1

Son

g 2

Son

g 3

Son

g 4

Son

g 5

Son

g 6

Son

g 7


situations giving rise to emotions, and the child in-teracts with the system by picking up one or morestuffed dwarfs that represent the set of emotions un-der study and that wirelessly communicate with thecomputer. This effort, led by Kathi Blocher, has beentested with autistic children ages 3–7 years. Withinthe computer environment, several children showedan improvement in their ability to recognize emo-tion.32 More extensive evaluation is needed in nat-ural environments, but there are already encourag-ing signs that some of the training is carrying over.Nonetheless, this work is only one small step up ahuge mountain; the difficulties in teaching an autis-tic to appropriately respond to an emotional situ-ation are vast, and we will no doubt face similar dif-ficulties for a long time in trying to teach computershow to respond appropriately.

Affective computing is in its infancy, but its poten-tial applications are vast. Our research group is con-tinuing to develop this technology while exploringnew application areas: how devices that help com-municate emotion can be used in focus groups andin audience-performer interaction, how affect-sens-ing in the car might improve safety, how wearabledevices can boost emotional awareness and helpwearers regulate stress and potentially improvehealth, how emotion can be sensed and respondedto respectfully in e-commerce, and how emotion

communication may potentially improve a learningexperience for children or adults. Emotion is a nat-ural part of many of life’s experiences, but so far com-puters have largely ignored it. As computers becomeubiquitous, and as users demand more customized,intelligent, adaptive interactions, I think it will be-come increasingly important for affective commu-nication to be enabled in this interaction.

Concluding remarks

This paper has highlighted several research projectsin the Affective Computing Research Group of theMIT Media Lab. The emphasis has been on illustrat-ing new technology that can begin to recognize andhelp communicate aspects of emotional expressionand to respond to it in an appropriate way. This areaof research is very new, and many other laborato-ries have recently started similar projects, so that itwould take a much longer paper to provide an over-view of all the research in this area. Readers whoare interested in related work are encouraged to pe-ruse the references of the papers and the workshopproceedings listed at the end of this paper, all ofwhich contain many pointers to related research con-ducted beyond MIT.

Over the years, scientists have aimed to make ma-chines that are intelligent and that help people use

Figure 8 Schematic of sensors in the Conductor’s Jacket

TRAPEZIUSEMG SENSOR(ON BACK)

BICEPSEMG SENSOR

OPPONENS POLLICISEMG SENSOR

FOREARMEXTENSOREMG SENSOR

RESPIRATIONSENSOR

CABLECONNECTION


their native intelligence. However, they have almostcompletely neglected the role of emotion in intel-ligence, leading to an imbalance on a scale whereemotions are almost always ignored. I do not wishto see the scale tilted out of balance in the other di-rection, where machines twitch at every emotionalexpression or become overly emotional and utterlyintolerable. However, research is needed to learnabout how affect can be used in a balanced, respect-ful, and intelligent way; this should be the aim of af-fective computing as we develop new technologiesthat recognize and respond appropriately to humanemotions.

Cited references

1. B. Reeves and C. Nass, The Media Equation, Center for theStudy of Language and Information, Stanford University(1996).

2. D. Goleman, Emotional Intelligence, Bantam Books, NewYork (1995).

3. P. Salovey and J. D. Mayer, “Emotional Intelligence,” Imag-ination, Cognition, and Personality 9, No. 3, 185–211 (1990).

4. R. W. Picard, Affective Computing, MIT Press, Cambridge,MA (1997).

5. “Workshop on Grounding Emotions in Adaptive Systems,”D. Canamero, C. Numaoka, and P. Petta, Editors, 5th In-ternational Conference of the Society for Adaptive Behavior(SAB ’98), Zurich (August 1998); http://www.ai.univie.ac.at/;paolo/conf/sab98/.

6. “Emotional and Intelligent: The Tangled Knot of Cognition,”D. Canamero, Editor, AAAI Fall Symposium, Orlando, AAAIPress, Menlo Park, CA (October 1998).

7. Workshop on Emotion-Based Agent Architectures (EBAA’99), J. D. Velasquez, Editor, in conjunction with the Auton-omous Agents Conference, Seattle (May 1999).

8. Emotions in Humans and Artifacts, R. Trappl and P. Petta,Editors, collected papers from Vienna workshop (August1999); to be published by MIT Press, Cambridge, MA.

9. “Human-Computer Interaction: Ergonomics and User Inter-faces,” Volume 1, H.-J. Bullinger and J. Ziegler, Editors, spe-cial session on Affective Computing and HCI, Proceedings ofHCI International ’99, Munich, Lawrence Erlbaum Associ-ates, Mahwah, NJ (August 1999).

10. Affect in Interactions: Towards a New Generation of Interfaces,A. Paiva, Editor, Springer Books, to appear; http://gaiva.inesc.pt/i3ws/i3workshop.html (1999).

11. J. D. Velasquez, “Modeling Emotions and Other Motivationsin Synthetic Agents.” Proceedings of AAAI 97 (1997), pp. 10–15.

12. J. D. Velasquez, “A Computational Framework for Emotion-Based Control,” Workshop on Grounding Emotions in Adap-tive Systems, Fifth International Conference on Simulation ofAdaptive Behavior, Zurich (August 1998).

13. H. Ishii and B. Ullmer, “Tangible Bits: Towards SeamlessInterfaces Between People, Bits and Atoms,” Proceedings ofConference on Human Factors in Computing Systems (CHI’97), Atlanta, ACM Press (March 1997), pp. 234–241.

14. M. Norwood, Affective Feedback Devices for Continuous Us-ability Assessment, S.M. thesis, MIT, Media Arts and Sciences,Cambridge, MA (2000).

15. J. Scheirer, R. Fernandez, and R. W. Picard, “Expression

Glasses: A Wearable Device for Facial Expression Recog-nition,” CHI ’99 Short Papers, Pittsburgh, PA (1999).

16. T. B. Sheridan, “Community Dialog Technology,” Proceed-ings of the IEEE 63, No. 3, 463–475 (March 1975).

17. T. Starner, S. Mann, B. Rhodes, J. Levine, J. Healey,D. Kirsch, R. Picard, and A. Pentland, “Augmented RealityThrough Wearable Computing,” Presence 6, No. 4, 386–398(1997).

18. R. W. Picard and J. Healey, “Affective Wearables,” PersonalTechnologies 1, No. 4, 231–240 (1997).

19. E. Vyzas and R. W. Picard, “Online and Offline Recognitionof Emotion Expression from Physiological Data,” Workshopon Emotion-Based Agent Architectures at the InternationalConference on Autonomous Agents, Seattle, WA (1999).

20. J. Healey, J. Seger, and R. W. Picard, “Quantifying DriverStress: Developing a System for Collecting and ProcessingBio-Metric Signals in Natural Situations,” Proceedings of theRocky-Mt. Bio-Engineering Symposium, Boulder, CO (1999).

21. D. Kirsch, “The Sentic Mouse: Developing a Tool for Mea-suring Emotional Valence,” S.M. thesis, MIT, Media Artsand Sciences (May 1997).

22. W. Ark, D. C. Dryer, and D. J. Lu, “The Emotion Mouse,”Proceedings of HCI International ’99, Munich, Germany (Au-gust 1999).

23. J. Healey and R. W. Picard, “Digital Processing of AffectiveSignals,” Proceedings of the International Conference on Acous-tics, Speech, and Signal Processing, Seattle, WA (May 1998).

24. J. Riseberg, J. Klein, R. Fernandez, and R. W. Picard, “Frus-trating the User on Purpose: Using Biosignals in a Pilot Studyto Detect the User’s Emotional State,” CHI ’98 Short Papers,Los Angeles, CA (1998); a longer version of this paper hasbeen accepted for publication in Interaction with Computers.

25. R. Fernandez and R. W. Picard, “Signal Processing for Rec-ognition of Human Frustration,” Proceedings of IEEE ICASSP’98, Seattle, WA (1997).

26. J. Klein, Y. Moon, and R. W. Picard, “This Computer Re-sponds to User Frustration,” CHI 99 Short Papers, Pittsburgh,PA (1999); a longer version has been accepted for publica-tion in Interaction with Computers.

27. J. Klein, “Computer Response to User Frustration,” S.M. the-sis, MIT, Media Arts and Sciences, Cambridge, MA (1998).

28. J. Healey and R. W. Picard, “StartleCam: A Cybernetic Wear-able Camera,” Proceedings of the International Symposium onWearable Computing, Pittsburgh, PA (1998).

29. J. Healey, F. Dabek, and R. W. Picard, “A New Affect-Per-ceiving Interface and Its Application to Personalized MusicSelection,” Proceedings of 1998 Workshop on Perceptual UserInterfaces, San Francisco, CA (1998).

30. T. Marrin and R. W. Picard, “Analysis of Affective MusicalExpression with the Conductor’s Jacket,” Proceedings of XIIColloquium on Musical Informatics, Gorizia, Italy (1998).

31. T. Marrin, Inside the Conductor’s Jacket: Analysis, Interpre-tation, and Musical Synthesis of Expressive Gesture, Ph.D. the-sis, MIT, Media Arts and Sciences, Cambridge, MA (1999).

32. K. Blocher, “Affective Social Quotient (ASQ): Teaching Emo-tion Recognition with Interactive Media and Wireless Ex-pressive Toys,” S.M. thesis, MIT, Cambridge, MA (May 1999).

Accepted for publication April 7, 2000.

Rosalind W. Picard MIT Media Laboratory, 20 Ames Street, Cam-bridge, Massachusetts 02139-4307 (electronic mail: [email protected]). Dr. Picard is founder and director of the Affective Com-puting Research Group at the MIT Media Laboratory. She holdsa bachelor’s degree in electrical engineering from the Georgia


Institute of Technology and a master’s degree and doctorate, bothin electrical engineering and computer science, from MIT. Shewas named a National Science Foundation Graduate Fellow in 1984and worked as a member of the technical staff at AT&T Bell Lab-oratories from 1984–1987, designing very large scale integration(VLSI) chips for digital signal processing and developing newmethods of image compression and analysis. In 1991 she joinedthe MIT faculty, in 1992 was appointed to the NEC DevelopmentChair in Computers and Communications, and in 1995 was pro-moted to associate professor. The author of over 70 peer-reviewedscientific articles in pattern recognition, multidimensional signalmodeling, and computer vision, Dr. Picard is known internation-ally for pioneering research into digital libraries and content-basedvideo retrieval. She is corecipient with Tom Minka of a best pa-per prize (1998) from the Pattern Recognition Society for theirwork on interactive machine learning with multiple models. Dr.Picard was guest editor for the special issue of the IEEE Trans-actions on Pattern Analysis and Machine Intelligence on “DigitalLibraries: Representation and Retrieval,” and edited the proceed-ings of the First IEEE International Workshop on Content-BasedAccess of Image and Video Libraries, for which she served aschair. She presently serves as an associate editor of IEEE Trans-actions on Pattern Analysis and Machine Intelligence, as well as onseveral scientific program committees and review boards. Her re-cent award-winning book, Affective Computing (MIT Press, 1997),lays the groundwork for giving machines the skills of emotionalintelligence.


Toward computers that recognize and respond to user...

Documents

Transcript of Toward computers that recognize and respond to user...