A Pondering Robot › e620 › 0d083ad505ad423fc53937a… · user experience. Research on loading...

17
IN DEGREE PROJECT MEDIA TECHNOLOGY, SECOND CYCLE, 30 CREDITS , STOCKHOLM SWEDEN 2020 A pondering robot How gestures can indicate loading in robots LINETTE NILSSON KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE

Transcript of A Pondering Robot › e620 › 0d083ad505ad423fc53937a… · user experience. Research on loading...

Page 1: A Pondering Robot › e620 › 0d083ad505ad423fc53937a… · user experience. Research on loading time for humanoid robots has so far mainly focused on the effects of auditory processing

IN DEGREE PROJECT MEDIA TECHNOLOGY,SECOND CYCLE, 30 CREDITS

, STOCKHOLM SWEDEN 2020

A pondering robotHow gestures can indicate loading in robots

LINETTE NILSSON

KTH ROYAL INSTITUTE OF TECHNOLOGYSCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE

Page 2: A Pondering Robot › e620 › 0d083ad505ad423fc53937a… · user experience. Research on loading time for humanoid robots has so far mainly focused on the effects of auditory processing

Abstract

Waiting while computers and websites process and load information is a common occurrence and research shows that loading indicators can be used to elevate the user experience. Research on loading time for humanoid robots has so far mainly focused on the effects of auditory processing indicators or auditory processing indicators in combination with bodily gestures. This study will instead discuss and evaluate the effects of solely using bodily gestures.

Previous research and interviews with two theatre professionals inspired two gestures that were designed for this study and deployed to a humanoid robot, Pepper. An in-the-wild study was conducted at a bank where 24 participants interacted with the robot, half of them with and half without loading gestures. Afterwards, the participants answered a complementary survey.

The results show that the gestures used in this study had an overall negative effect on the user experience. The implemented gestures might have been confusing the users as they possibly had other expectations when interacting with the robot during this study, this would have affected the user experience negatively. It could be expectations based on previous encounters with other Pepper, or other humanoid robots, or expectation formed when initiating the study on Pepper.

Future studies in the field of human and robot interactions could, for example, focus on the use of more exaggerated gestures or how stress could affect participants expectations on their interactions with humanoid robots. A recommendation, in case this study were to be redone, would be to conduct the test in a more clinical environment to keep more parameters constant throughout the study.

Page 3: A Pondering Robot › e620 › 0d083ad505ad423fc53937a… · user experience. Research on loading time for humanoid robots has so far mainly focused on the effects of auditory processing

Sammanfattning

Att vänta på att datorer och hemsidor ska ladda är en vanlig företeelse och forskning säger att laddningsindikator kan användas för att förhöja användarupplevelsen. Forskning på laddningsindikatorer för robotar har hittills fokuserat på effekterna av auditiva laddningsindikator eller auditiva laddningsindikatorer kombinerat med kroppsliga gester. Detta examensarbete undersöker och utvärderar istället effekterna av att endast använda kroppsliga gester för att indikera laddning hos humanoida robotar.

Tidigare forskning och intervjuer med två teater experter inspirerade två gester som designades för denna studie. Gesterna applicerades på en humanoid robot av modellen Pepper. En “in-the-wild” studie genomfördes på en bank där 24 deltagare fick interagera med roboten. Hälften fick interagera med roboten med gesterna och hälften fick interagera med en statisk robot. Deltagarna fick sedan elaborera sina svar genom att fylla i en enkät.

Resultaten tyder på att gesterna som användes för denna studie påverkade användarupplevelsen negativt. Gesterna kan ha förvirrat användarna då de kan ha haft andra förväntningar på interaktionen med roboten, vilket i sin tur kan ha haft påverkat användarupplevelsen negativt. Detta kan vara förväntningar baserade på tidigare interaktioner med Pepper, andra humanoida robotar, eller förväntningar som sattes när interaktionen påbörjades under denna studie.

Framtida studier inom människa-och robotinteraktion kan, till exempel, studera mer överdrivna gester eller undersöka hur stress kan påverka användarnas förväntningar på deras interaktioner med humanoida robotar. Ifall denna studie skulle återskapas skulle jag rekommendera att utföra undersökningen i en mer klinisk miljö så fler parametrar kan hållas konstanta genom studiens gång.

Page 4: A Pondering Robot › e620 › 0d083ad505ad423fc53937a… · user experience. Research on loading time for humanoid robots has so far mainly focused on the effects of auditory processing

A Pondering RobotHow gestures can indicate loading in robots

Linette [email protected]

School of Electrical Engineering and Computer ScienceRoyal Institute of Technology

Stockholm, Sweden

ABSTRACTWaiting while computers and websites process and load in-formation is a common occurrence and research shows thatloading indicators can be used to elevate the user experience.Research on loading time for humanoid robots has so farmainly focused on the effects of auditory processing indica-tors or auditory processing indicators in combination withbodily gestures. This study will instead discuss and evaluatethe effects of solely using bodily gestures.Previous research and interviews with two theatre pro-

fessionals inspired two gestures that were designed for thisstudy and deployed to a humanoid robot, Pepper. An in-the-wild study was conducted at a bank where 24 participantsinteracted with the robot, half of them with and half withoutloading gestures. Afterwards, the participants answered acomplementary survey.

The results show that the gestures used in this study hadan overall negative effect on the user experience. The im-plemented gestures might have been confusing the usersas they possibly had other expectations when interactingwith the robot during this study, this would have affected theuser experience negatively. It could be expectations based onprevious encounters with other Pepper, or other humanoidrobots, or expectation formed when initiating the study onPepper.

Future studies in the field of human and robot interactionscould, for example, focus on the use of more exaggeratedgestures or how stress could affect participants expectationson their interactions with humanoid robots. A recommen-dation, in case this study were to be redone, would be toconduct the test in a more clinical environment to keep moreparameters constant throughout the study.

CCS CONCEPTS•Human-centered computing→ Interaction techniques;Gestural input; Usability testing; User centered design;

KEYWORDSHuman Robot Interactions, User Experience, Loading Time,Interaction Design

1 INTRODUCTIONTechnology has developed and improved throughout theyears and is capable of processing data faster than ever be-fore. The amount of data to process has also increased, forexample because of higher quality graphics and more com-plex tasks. Therefore we still have to wait while applicationsor websites loads. The delay from submitting data until thenext interaction will, in this paper, be referred to as loadingtime. How to handle loading time is a heavily researched sub-ject and long or poorly executed loading interactions haveand are continuing to affect the user experience negatively[2, 6, 13, 18].

There are many ways of handling loading time. One of themost common ways of indicating loading is to implementvisual processing indicators such as progress bars, spinnersor throbbers. Non-visual loading indicators can for exampleuse auditory indicators. This is for example used in refriger-ators to warn that the door has not been closed. Mechanicaland humanoid robots are not excluded from the need forhandling loading time and more research is needed withinthis area in order to implement more suitable processingindicators.

As humanoid robots are primarily working and interactingwith humans it is crucial that they act in a way humans arecomfortable with and used to. When studying how peopleinteract with other people, gestures and body language playan important role in how we communicate. The interplaybetween people is complex and each person needs to contin-uously adapt to the changing dynamics of the interaction.Research on robot loading interaction has so far mainly

focused on studying the effects of audible processing indi-cators or audible processing indicators in combination withbodily gestures. To contribute to the field of human-robotinteraction, this paper will study and discuss the possibleeffects of only implementing gestures in robot interactions.The focus will be on how gestures used during loading canaffect the users perceived wait and overall user experience.To discuss this research question I will explore different typesof gestures that could be suitable for a humanoid robot toindicate loading. To explore this, interviews were conducted

Page 5: A Pondering Robot › e620 › 0d083ad505ad423fc53937a… · user experience. Research on loading time for humanoid robots has so far mainly focused on the effects of auditory processing

with two professionals working in theatre as they have ex-perience of how to convey signals to others by exclusivelyusing movements.

2 RELATED RESEARCHRobot loading interaction is a subject that consists of severalresearch areas and eachwill be further explained in dedicatedsections further down. To better understand how to design aloading interaction for a humanoid robot it is first importantto comprehend why loading time is an issue and how thisissue has been dealt with previously in other applications.Previous research projects conducted within the field of

human and robot interactions have in different ways tackledthe struggles of loading time. Auditory indicators and obvi-ous, almost cartoon-like gestures, such as the robot strokingits chin or scratching the top of its head, have previouslybeen implemented and studied [11]. To further explore ges-tures for the humanoid robot Pepper I turned to theatre andmime as they are experts in communicating using solely ges-tures. Interviews were conducted with two theatre and mimeexperts to discuss how gestures are perceived by others aswell as get professional input on what kind of gestures wouldbe suitable to implement on the humanoid robot Pepper. Thisis presented in the results.

Loading timeA study about web page dwell time analysis performed byMicrosoft Research suggests that fast loading websites andapplications are an important factor in keeping users onwebsites [6]. Around 49% of users are said to expect a websiteto load in two seconds or less, and 18% expect websites toload instantly [2]. Long loading times contribute to increasedfrustration and dissatisfaction with the product, which inturn can lead to loss of revenue [2, 18]. A study done onWalmart.com, where the slowest 5% of users had to wait forup to 24 seconds, showed that for every 100 milliseconds ofimprovement, their incremental revenue grew by up to 1%[13].The perceived loading time, however, does not always

equal the actual loading time. Research shows that the aver-age web user perceives loading times to be 15% slower thanit is in reality and 35% slower when recalling the experiencein hindsight [2]. A study based on the time perception phe-nomenon called Vierordt’s law, the overestimation of shortduration and underestimation of long duration, showed thatwhen the estimated duration increased, that the satisfactiondeclined [23]. This is where implementing loading indica-tors can be beneficial as it has been shown to make peopleperceive the loading time as shorter and shorter perceivedloading time contributes to a more positive user experience[2, 16].

In some cases a short loading time is preferred over nowaitat all due to peoples’ expectations and previous experiences.A loading time that can be estimated based on similar eventsis called a predictive delay [20]. Haering and Tomaschkeargue in their paper that adhering to the predictive delaytime can improve the overall user experience. Persson writesthat the expectation of delay is the reason why some serviceswill have a longer loading time than needed, for example forpayments, as the delay has been "suggested to induce trust"[14].

Human InteractionsWords, gaze and movement all play an important role whenpeople interact with others. Pauses are also essential in com-munication. Researchers report that the natural pause in"ordinary" human to human communication in Japanese isaround 0.59 seconds [10]. During conversations people of-ten use conversational fillers (words such as "um", "uh", and"like") when they need time to think of a reply or whenthey have lost their train of though. Conversational fillers,CFs, are most commonly used at the beginning of a sentencewhere the speaker needs to simultaneously process whatthey just heard and form an answer to reply [1]. It has alsobeen hypothesised that CFs are used to prevent interruptionfrom listeners while the speaker is thinking [8].

People usually gaze at their listeners during conversationsin order to read the situation and predict where the conversa-tion is heading. People’s eyes tend to wander when they arethinking and it is believed that eventual visual distractionsmight "interfere with the cognitive planning" [5]. Commonly,people tend to gaze to the left, especially in combination withuttering CFs [15].

Robot InteractionsToshiyuki Shiwa et al. found that when interacting with arobot, people tend to prefer a short, one second long delayrather than an immediate response [19]. However, the pos-itive user experience would decrease past two seconds ofloading time. They believe a reason for this might be a conse-quence of the visual similarity that robots has to humans. Ashumans tends to naturally have a delay in their responses,they could expect a human-like robot to also mimic thatbehaviour.Several researchers have found that the use of conversa-

tional fillers in robot interactions improves the users’ pa-tience during long loading times and improved their expe-rience of interacting with the robots [4, 11, 19, 22]. A studyconducted by Noel Wigdor et al. on 26 children showedthat CFs "can improve the perceived speediness, aliveness,humanness, and likability of the robot, without decreasingperceptions of intelligence, trustworthiness, or autonomy"[22].

2

Page 6: A Pondering Robot › e620 › 0d083ad505ad423fc53937a… · user experience. Research on loading time for humanoid robots has so far mainly focused on the effects of auditory processing

Figure 1: Diagram of Mori’s uncanny valley (simplified andtranslated by MacDorman in 2005)

Previous studies focusing on specific scenarios and ges-tures (sometimes in combination with CFs) have shown toimprove the users’ perception of robots being more life-like and that their user experience became more pleasant[4, 9, 11, 21, 22]. Examples of these are; the importance ofadapting space between a humanoid robot and the user basedon users body language [9] or using exaggerated thinkinggestures in combination with CFs for interacting with chil-dren [22]. However, the research on the effects of solely usingbodily fillers (for example the touching of arms or fidgeting)is, as of now, lacking.

Uncanny ValleyMasahiro Mori, a professor in robotics, hypothesized in 1970that people’s positive responses to human-like robots wouldbe negatively affected when a robots appearance would ap-proach, but fail in trying to attain, a lifelike appearance. Asimplified illustration of Mori’s theory is visualised in figure1 [7].

In a study on predictive coding (the brain expecting or try-ing to predict events) and the uncanny valley in human robotinteractions, Saygin et. al found that "... there was a mismatchbetween the human-like appearance and the mechanical mo-tion, leading to a larger prediction error, manifest as activityin relevant brain regions" [17]. If appearance, movementsand sound do not align with a person’s perceptual expec-tations it can impact the user negatively [3]. It is thereforeimportant to match movements and gestures to how human-like the robot is perceived in order to have a positive userexperience.

3 METHODThe robot Pepper, produced by SoftBank Robotics, was cho-sen for this project as it is capable of performing gestures andis one of the best selling humanoid robots in the world. Asof 2018, around 12 000 Peppers had been bought in Europealone [12].Two theatre professionals were interviewed. Both have

studied theatre and worked as directors, actors, mimes andare currently teachers at the Calle Flygare theatre school inStockholm. Theatre professionals were chosen for the inter-views as they are experts on movement and have knowledgeon how different sets of gestures can be used to conveycertain messages to spectators. The interviews were semistructured, giving the professionals freedom to explore dif-ferent ideas. The focus of the interviews were discussinghow gestures can be perceived by others and what kind ofmovement could be suitable for Pepper. A central point ofthe two discussions was how do humans use gestures to beperceived as thinking. This was discussed as thinking is thehuman equivalence to loading in machines.The implemented gestures were inspired and based on

the interviews with the professionals and previous researchdone within robot interactions and human interactions. Thechosen gestures are described in greater depth in the results.The implementation was evaluated through A/B testing sothat the results of the implementation of loading gesturescan be compared to the experience of not having gestures.The users got to randomly interact with the robot when it ei-ther had gestures implemented or no gestures implemented.In total 24 people interacted with the robot. 12 of them in-teracted with the robot without any loading gestures, whichwill be called the static group, and 12 interacted with loadinggestures, called the gestures group.For three days (a Friday, Monday and Tuesday in No-

vember) Pepper stood in the reception of a bank in centralStockholm and gathered data during office hours. The studywere conducted in-the-wild and the participants were notinformed beforehand of the goal of the study. 24 partici-pants partook in the study and the median age was 43, theyoungest 21 and the oldest 64. Seven women and 17 men.The static group consisted of eight men and four women andthe gestures group consisted of nine men and three women.The loading time was kept constant, four seconds, for

both the groups, with and without the gestures. It was keptconstant in order to prevent time variations between eachtest occasions. A loading time of four seconds was chosen asprevious studies [2, 19, 22] show that loading times exceedingtwo seconds have been shown to negatively affect the userexperience.To keep the user engaged in the interaction, a goal ori-

ented scenario was chosen for the evaluation. Keeping the

3

Page 7: A Pondering Robot › e620 › 0d083ad505ad423fc53937a… · user experience. Research on loading time for humanoid robots has so far mainly focused on the effects of auditory processing

interaction short minimizes the risk of the user getting dis-tracted. The scenario chosen for this project was checkingin at a reception. The scenario was; the participant entersthe bank and, on the robot’s touch screen, selects the optionto check in. For the first round, Pepper asks "whom are youmeeting?" to which the user then inputs on the screen. Pep-per performs the first loading gesture. For the second round,Pepper asks "what is your name?" and the user inputs theirname on the screen. Pepper performs the second loadinggesture. For the third round, Pepper asks the user for thecompany they are representing and after the user inputs iton its screen, Pepper performs the first loading gesture again.Pepper ends the interaction by telling the user to collect theirvisitor badges at the reception and a "Good bye" button isshown on screen.At the reception, the participant is presented with the

option to fill out a complimentary survey for the study. Inthe survey the participants get to rate if they were in a hurry,their impressions of the robot, and their thoughts on thecheck in process on a likert scale in the survey.

4 RESULTSThe results are presented in two sections. First, the inter-views of the professionals were we discussed how peoplecan perceive gestures and how we can make use gesturesto be perceived in certain ways. Then the implemented ges-tures are described and visualised. Lastly, the results of thestudy are presented. This includes the quantitative data gath-ered from the interactions and the qualitative data gatheredthrough the survey questionnaire.

Interviewing the professionalsTwo professionals and experts of gestures and movementwere interviewed in order to design suitable movements forthe robot, the interviewed experts were Martin Hasselgren(actor, mime, director and choreographer) and UlfWahlströmRönell (artist, educator, director, mime and clown). Bothtalked about how movements should start from the chestand that the movement and positioning of the chest is animportant pillar within the teachings of theatre. Ulf said “...in my world it is what happens here [the chest] that matters".He further explained that an exposed chest (see Figure 2,Bust 1) emits confidence and can in certain situations beperceived either as confident or aggressive, while a curvedback can be perceived as timid or insecure (see Figure 2, Bust2). Martin said that all schools that teaches mime, teach theirstudents that every "movement uses the center of the body[chest and stomach] as a starting point. It’s an impulse [...] "and that the movements then are carried out and enhancedby the movements of the rest of the body.Both professionals were aware that robots are limited in

some of their movements. Martin pointed out the difficulty

Figure 2: Illustration of exposed chest, bust 1, and a curvedback, bust 2

of trying to reproduce this "impulse" that starts from thecentral regions of the body as the robot Pepper is limited inits freedom of movement in this particular region.A thought that was brought up in both interviews was

"As the robot is clearly distinguishable from a real human,should it use other ways of expressions, unlimited to thehuman limitations of movements, and what could it use inthis case?". Martin brainstormed about how Pepper could begiven a "character" and that "the character could be as humanas we would like or be as artistically elevated as we wouldwish it to be". He talked about how a character for Peppercould either have small and delicate or big and assertivemovements depending on the environment it will be situatedin. He said "If Pepper is going to be a traffic police, then itshould not use small and delicate movements. It needs toemit assertiveness through clear and thoughtful movements".Ulf brought up a workshop he partook in, about 15 yearsago, that used a mechanical, non-humanoid, arm. While thechoreography it performed was far from human-like, he saidthat some movements would make the audience think "ah,that meant something, that movement had a meaning behindit". Humans like to find a "humanness" in objects and evenin movements that clearly are not human.

Ulf discussed subtle movements, leaning forward and back-wards, as well as exaggerated movements, touching the chin,for thinking and Ulf talked about reasons to use one or theother. If the objective is to imitate natural human behaviorsthen subtle movements are better suited and exaggeratedmovements can instead be used for a comical twist. If the ro-bot is interacting with children it could instead be more ben-eficial to use exaggerated, cartoon-like, gestures as childrenlikely are familiar with such movements through movies.For this study, both Ulf and Martin recommended to exploremore subtle movements as Pepper has many human-like

4

Page 8: A Pondering Robot › e620 › 0d083ad505ad423fc53937a… · user experience. Research on loading time for humanoid robots has so far mainly focused on the effects of auditory processing

features. It seems appropriate as the users likely are adultsand the robot will be in a professional environment. Usingexaggerated movements will make the users know that "therobot is thinking" but only because the gestures are com-monly used to exaggerate that a character is thinking andnot necessarily because it is perceived as a human thinking.

Martin discussed the meaning of gestures and how it candiffer between different regions of the world. He said that"while body-language and movements are universal, ges-tures are not". For example, he pointed out that sidewayshead movement can be associated as a "no" or denying some-thing in western countries because of the similarity of shak-ing one’s head, but that such a movement in India wouldinstead mean "yes". How we as people perceive a gesture cantherefore be based on, for example, our culture and previousexperiences. He also discusses how different gestures can bepresented differently depending on age and sex, which allplay a role in how gestures and movements are perceived byothers.Since Peppers shoulders are static and movements are

rather limited, Ulf proposed that the robot Pepper could leanslightly backwards, in order to mimic a curving back. He alsosuggested to break eye-contact with the user and slightly tiltits head to convey a sense of thinking to the user. Once therobot is ready to continue the interaction it should straightenits back, perhaps lean forward and then "reconnect a contact",for example eye-contact or continue a dialog. He stressed theimportance of avoiding fast and harsh changes in movementsand speed as people tend to perceives it as unnatural andperhaps mechanical. Those types of movements often areassociatedwith robotic movements. He says that amovementshould instead have a soft transition both into and out ofa movement. He says "Ideally, the previous motion shouldalmost linger into the new motion, like a shadow of theprevious movement". This can be perceived as thinking oralmost being absent-minded.

GesturesBased on the interviews with the professionals and previ-ous research, two sets of gestures were designed and imple-mented for this study. For example, averting of gaze wasmentioned both in the interviews and was brought up bythe researcher Adam Kendon in his studies [5]. The rea-son two sets of gestures were designed and implemented isbecause people typically alternate between different move-ments when interacting with others. The difference betweenthe two gestures are mainly that they are mirrored to eachother, but some other minor changes are also present in or-der to mimic the natural human behaviours more closely. Inthe first gesture the robot tilts slightly backward to its rightside while it looks up towards the left, during this time, itis also closing its hands slightly. It lingers here for a split

Figure 3: First gestures used in first and last round

Figure 4: Second gestures used in second round

second and then straightens itself and looks back towardsthe user while opening its hands. In the second gesture therobot leans slightly but this time to the right while it looksdown to the left. Again it closes its hands during this. Thenit straightens itself and leans slightly forward and looks uptowards the user while opening its hands.Four animation frames were chosen from each of the im-

plemented gestures and can be seen in figure 3 and figure 4.The red color corresponds to the previous animation frame,this can be used to compare the differences to the currentframe. In figure 3 it is clear that the robot mostly moved itshead and arms in between each animation frame. In figure 4it is instead clear that the robot moved its head and leanedbackward and forwards rather than doing larger movementswith its arms.

Comparing the two gestures to each other, it is clear thatthe second gesture (figure 4) used more subtle movementsthan the first gesture (figure 3).

5

Page 9: A Pondering Robot › e620 › 0d083ad505ad423fc53937a… · user experience. Research on loading time for humanoid robots has so far mainly focused on the effects of auditory processing

The studySix participants first thought that the robot could take voiceinput and some seemed disappointed when they were toldthat they had to manually type on the screen, four of thesewere in the gestures group and two in the static group. Oneof these participants uttered disappointed in Swedish duringtheir interaction with the robot that "It [the robot] needs tobe able to understand talking".The average interaction time for the static group was 86

seconds or 1.4 minutes and the median was 51 seconds or0.85 minutes. Three of the participants did not properly endtheir interaction with the robot which somewhat extendedthe total interaction time. The average interaction time forthe gestures group was 111 seconds or 1.85 minutes and themedian was 87 seconds or 1.45 minutes. Five of these didnot properly end their interaction. The gestures group wasabout 1.5 times slower than the static group even thoughthe loading time was identical. This difference is elevatedbecause the gestures group had more participants that didnot end their interactions. If excluding the participants thatdid not end their interactions, the static group would havean average of 54 seconds, median score of 50 seconds, andthe gestures group 62 seconds, median score of 53. Hence,there would still be a difference, where the gestures group isslower, but the difference is not as significant.

The participants in both test groups tended to look awayfrom the robot more during the first and last rounds, seefigure 5. Given that eight participants did not properly endtheir interactions with the robot, five in the gestures groupand three in the static group, this added to the count ofaverting of gaze for the last round. For the gestures group thisassumption is supported by the higher standard deviationon the last round, see Round 3 in figure 5. If we exclude theeight participants that did not end their interactions, thenduring the last round (round 3) the gestures group lookedaway 36 times in total, instead of 98, while the static grouplooked away 10 times, instead of 22.In total, the gestures group looked away over two times

as much as the static group. 2.4 times during the first round,about 2.2 during the second round and 4.5 during the finalround. If excluding the eight participants that did not endtheir interactions, then the gestures group looked away 3.6times as much as the static group during the final round.However, this is an unbalanced comparison of seven partici-pants in the gestures group versus nine in the static group. Ifcomparing the median score of the two groups, excluded theeight participants that did not end their interactions, thenthe difference would be that the gestures group tended tolook away two times as much as the static group during thefinal round of interactions.

Round 1 Round 2 Round 30

5

10

15

1.8 1.3 1.8

4.42.9

8.2

A veragetim

esuser

look

edaw

ayfrom

Pepp

er

Static group Gestures group

Figure 5: Average count of gaze aversion for both groups dur-ing each rounds

Past interaction questionnaireThe median score for the first survey statement, "Q1. I wasin a hurry to get here today", was of three out of five for thestatic group versus four out of five for the gestures group,see figure 6. The gestures group felt they were more in ahurry than the static group which could be an indicationthat they were more stressed upon their arrival and check-in.Participants in the static group stated in the survey that the"Subway was late" and that they "Gotta work". Given that theparticipants in the gestures group were more in a hurry itcould mean that they were more likely to be less patient thanthe static group. The participants who did not correctly endtheir interactions with the robot could also be an indicationthat they were more in a hurry to move on from the checkin process. They could have been in such stress that they didnot read the screen or just skipped clicking the "Good bye"button on the robots’ screen to end their interaction.The median score for statement two, "Q2. I feel that the

check-in process was intuitive", was four out of five for thestatic group. The gestures group had a median score of three.The static group felt that the check-in process was moreintuitive than the gestures group.

There were two participants, one in each test group, thathad trouble hearing the robot. As the robots speech wascrucial for the participants to understand what input to givethe robot this was quite unfortunate.

The third statement, "Q3. M1 (the robot) felt human-like",got a median score of 2.5 out of five for the static groupwhile the gestures group got a slightly lower score of two

6

Page 10: A Pondering Robot › e620 › 0d083ad505ad423fc53937a… · user experience. Research on loading time for humanoid robots has so far mainly focused on the effects of auditory processing

How strongly do you agree/disagree with the following statements?

Strongly agree (5)

Agree (4)

Neutral (3)

Disagree (2)

Strongly disagree (1)

Q1. I was in a hurry to get here today

Q2. I feel that the check-in process was intuitive

Q3. M1 (the robot) felt human-like

Q4. The check-in process was quick

Q5. I would use M1(the robot) to check-in again

Gestures Static

Figure 6: Median values of survey statements for both staticand gestures group

out of five. Neither of the groups felt that the robot wasvery human-like, but the static group felt it was slightlymore like a human. One participant in the gestures groupelaborated on the statement that "it’s still [just] a robot". Oneparticipant in the static group said "looked me in the eyes"which they explained contributed to perceiving Pepper asmore human-like.

For statement four, "Q4. The check-in process was quick",the median score was four out of five for the static group andthe gestures group had a median of three out of five. Moreparticipants in the static group thought the check in processwas quicker than the gestures group. Given that the staticgroup in total completed the check-in about 1.5 times fasterthan the gestures group this could have contributed to themore positive score on this statement.The static group got a median score of 4.5 out of five for

the fifth statement, "Q5. I would use M1 (the robot) to check-in again". The gestures group got a slightly lower medianscore of 3.5. The static group had a more positive attitudetowards using the robot for checking in again in the futurethan the gestures group. A participant in the static groupsaid "it was easy [to check in]" and would therefore use therobot again. One of the participants who participated duringthe first day of testing, also revisited and reused the robot theother two days of testing and each time they preferred usingthe robot to check in (only the first encounter was submittedto the study). However, one participant elaborated that theywould be more inclined to use it again only if the robot hadvoice recognition as input.

A majority of the participants conducted the test duringthe mornings. 13 participants, six from the gestures groupand seven from the static group, interacted with the robotbetween 8AMand 11AM. Between 11AMand 1 PMdid eightparticipants interact with the robot, five from the gesturesgroup and three from the static group. From 1 PM to 5 PMdid three participants partake in the study, one from thegestures group and two from the static group.

How strongly do you agree/disagree with the following statements?

Strongly agree (5)

Agree (4)

Neutral (3)

Disagree (2)

Strongly disagree (1)

Q1. I was in a hurry to get here today

Q2. I feel that the check-in process was intuitive

Q3. M1 (the robot) felt human-like

Q4. The check-in process was quick

Q5. I would use M1(the robot) to check-in again

8 AM - 11 AM 11 AM - 5 PM

Figure 7: Median values of survey statements for gesturesgroup during different times of the day

How strongly do you agree/disagree with the following statements?

Strongly agree (5)

Agree (4)

Neutral (3)

Disagree (2)

Strongly disagree (1)

Q1. I was in a hurry to get here today

Q2. I feel that the check-in process was intuitive

Q3. M1 (the robot) felt human-like

Q4. The check-in process was quick

Q5. I would use M1(the robot) to check-in again

8 AM - 11 AM 11 AM - 5 PM

Figure 8: Median values of survey statements for staticgroup during different times of the day

Generally, the static group were less in a hurry, both inthe morning and after 11 AM, and rated more positive on thestatements than the gestures group. Both groups tended togive slightly more positive ratings throughout the statements2-5 in the morning while only the gestures group were morein a hurry in the mornings. Meaning that being less in ahurry does not necessarily equal being more positive to therobot and its interactions.

Comparing figure 7 and figure 8 some differences are visi-ble between the early participants and after 11 AM partic-ipants scores for both the static and gestures group. Thegestures group tended to be more in a hurry (Q1) earlieron the day. The static group had the same median score forin the morning and after 11 AM. Both groups thought thatthe check-in process was slightly more intuitive (Q2) in themorning. The gestures group rated consistently throughoutthe day on how human-like the robot felt (Q3) while thestatic group rated the robot more human-like in the morning.The gestures group rated the check-in process as faster (Q4)in the morning while the score was consistent throughoutthe day for the static group. For the fifth statement bothgroups were more positive about using the robot again inthe future during the mornings.

7

Page 11: A Pondering Robot › e620 › 0d083ad505ad423fc53937a… · user experience. Research on loading time for humanoid robots has so far mainly focused on the effects of auditory processing

On the question of what the participants disliked with therobot, two stated that they had trouble hearing the robot, oneof each test group. One of them wrote that it was "difficult tohear because of the echo in the room" and the other thoughtthat the "sound volume could be higher". Given that the studywas conducted in-the-wild, in a real reception, there were attimes lots of movement and noise in the room which couldhave affected the participants in different ways, for exam-ple at times being noisier and therefore harder to hear therobot speaking. One participant mentioned that they haddifficulties inputting information on the screen. They wrote"A bit difficult to write on the ipad since it was morning".This could have been because the rain and cold on that par-ticular day. The participant could have arrived with gloveson making it more of a hassle to check in as they would needto take off the gloves to use the robot’s screen to input.

5 DISCUSSIONThe purpose of this study was to evaluate how users couldbe affected by a loading time with gestures on a humanoidrobot and what gestures would be suitable for loading timeinteractions. The results indicate that the gestures that wereimplemented for this study did not improve the overall userexperience. On the contrary, it had a negative impact on theoverall experience and the participants tended to score morepositive when interacted with the static robot. There aremany reasons to why this is the outcome of the study. Forexample, the participants age group might have affected theresults of the study.

The average age of the gestures group was about ten yearsolder than the average for the static group. The older peopleare, the more likely they are to have gotten used to certainstandards of interactions. Having set expectations on how tointeract with a technology could negatively affect the userexperience, which Haering, Tomaschke and Persson arguedfor when discussing predictive delays [14, 20].The static group gazed away around half as much from

the participants than the gestures group did. Given that thegestures group spent 1.5 timesmore time interacting with therobot it seems likely that they would be more prone to beingabsent-minded or getting distracted during their interactions,and therefore look away from the robot more. The gesturesgroup answered, on average, that they felt more stressedupon their arrival than the static group which could be anindicator that they were more prone to being distracted.The gestures group possibly being more absent-minded,

as well as more stressed, than the static group during theirinteractions with the robot, could contribute to why the ges-tures group tended to rate their interactions with the robotmore negatively than the static group. Being more stressedwhile interacting with the robot could make the participantless patient and expect quicker interactions. However, when

comparing the questionnaire of participants interacting withthe robot in the mornings versus after 11 AM it is visible thatbeing less stressed does not directly equal a more positiveexperience. The morning gestures group was more stressedbut gave more positive ratings than the participants after 11AM that interacted with gestures.

The gestures group who interacted with the robot before11 AM tended to rate their interaction with the robot slightlymore negatively compared to those who interacted after 11AM. The static groups experience of using the robot beforeand after 11 AM tended to not differentiate as much as thegestures group. It could be because the participants that in-teracted with the robot with gestures during the morningswere more stressed than those after 11 AM. However, thestatic group was equally stressed but rated also higher in themornings rather than those after 11 AM. Based on the resultof this study the reasons for this may only be speculated andcan be researched further. Possible reasons might be becausethe participants arriving in the morning had different sched-ules than those arriving later during the day which mighthave affected their mindset at the time.

Both professionals said that exaggerated movements couldbe used but then it should be obvious and done intention-ally. It should be clear that it is an exaggeration and it couldtherefore be beneficial to avoid imitating human behavioursin uses and scenarios other than loading. Participants mighthave had other expectations on how the robot would moveas Pepper does not look like a human in many ways. There-fore, using movements that tries to imitate a natural humanbehaviour could instead be confusing if not done correctly.One participant said "It moved strangely for no purpose". Apossible reasons to why they thought the movements were"strange" and had "no purpose" might be because the move-ments were too subtle, or more subtle than they perhapsexpected or anticipated. As a future study it could there-fore be beneficial to conduct a similar evaluation but insteadimplement more exaggerated gestures, such as the robotstroking its chin or scratching the head. The participantmight have misinterpreted the purpose of the movements asthe implemented gestures might have another meaning tothem and were confusing when used in this context.

Method CriticismChanging any of the parameters used during the study couldhave yielded different results. Changing the loading timewould likely affect the user experience. Four seconds of load-ing might have been too long to rely solely on body gesturesas loading indicator. In normal human to human commu-nication people usually communicate by saying that they"need to think about it" when they need longer time to think.It might be beneficial to study if this can be applicable inhuman- and robot interactions.

8

Page 12: A Pondering Robot › e620 › 0d083ad505ad423fc53937a… · user experience. Research on loading time for humanoid robots has so far mainly focused on the effects of auditory processing

The duration of the loading time is likely also dependenton the complexity of the question asked. For example, mostpeople would be confused if someone said their name and thepeople they were talking to needed time to think about thatname. However if someone was asked a complex question itwould only be natural if they would need more time to thinkof an answer. As this study used four seconds loading timefor all questions this might contributed to the user perceivingPepper as less human-like.In the diagram of Mori’s uncanny valley, figure 1, a hu-

manoid robot is positioned at around 50% in human likenessand perceived around half as positively familiar as a "healthyperson". What kind of humanoid robot that is referred toby Mori is unclear, but there are various degrees of howhuman-like humanoid robots are and it will also depend onhow it moves and behaves. The robot that was used in thisstudy, Pepper, does have some human features, such as armsand fingers, but it is also colored white, has a screen on itschest and static facial expressions which makes it clearlynon-human. Most participants did not feel that Pepper wasvery human-like (see Q3, table 6). Based on the results ofthis study, it is possible that the subtle human-like gesturesthat were implemented in this study might have had a slightnegative impact on the robots perceived human-likeness. Ibelieve that some contributing factors to why the partici-pants generally perceived Pepper as non-human-like is thatthe overall interaction was quite unnatural.The participants might have perceived Pepper as more

human-like if it could take more voice commands and theusers could input using speech instead of pressing the robotsscreen, which clearly is not a natural way of interaction be-tween humans. Being focused on entering inputs on Pepper’sscreen might add to Pepper feeling more like a tool ratherthan feeling human-like. This might lead to some users fo-cusing solely on the screen instead of the full interactionwith Pepper. For example, I noticed a few participants beingso focused on the screen that they might not have noticedthe robot moving at first, which seemingly surprised them.The interaction starts off by Pepper asking the user for a

name and given that it does not explicitly say to input theanswer on the screen, this could set the user expectationsthat it would be possible to speak to it. By not being ableto follow users expectation could lead to affecting the userexperience negatively [20]. For future studies I would suggestconducting a study where the user does not use a screen fortheir input. Removing the screen on Pepper could make theuser more attentive on bodily changes in the robot ratherthan being focused on the screen.Participants that did not end their interactions properly

tended to gaze away from the robot more than those who didend the interaction. A reason for this might be because theywere more in a hurry or were less attentive to the interaction.

Another reason could be because once the robot says "Have anice day" the interaction might feel like it has already endedfor the user as that is how interactions can end in human tohuman interactions. In hindsight it would have been betterto end the data collection when Pepper said "Good bye" asthis lead to higher deviations on the counting of how muchthe participant looked away from the robot during round 3of interactions, see figures 7 and 8.Due to the Pepper robots limited freedom of movement,

some of the movements that was discussed with the profes-sionals were not possible to implement on Pepper. Tryingto mimic the "impulse" that the professionals mentioned, amovement starting from the chest, was cumbersome as therobots shoulders are static and it only has a few joints inthe central part of its body, around its waist. The technologyis still lacking and it would be very difficult to mimic animpulse using a humanoid robot as of today.

Future WorkThis study implemented gestures that attempted to imitatenatural human behaviours that people tend to do when theyare thinking. The study conducted by Ohshima et. al showedto have positive effects when more exaggerated thinkinggestures were implemented (although used in combinationwith conversational fillers) which leads to a discussion ofwhether it is beneficial for humanoid robots to imitate hu-man gestures or not. Implementing exaggerated movementscould be more beneficial than subtle movements consideringsome current limitations of modern robot technology. Pepperand many other humanoid robots are still limited in theirmovements and are clearly non-human. It could therefore bemore beneficial to implement less human-like movementsto distance the association to humans and set more realisticexpectations on the interaction.

As humanoid robots are a new field and there are few setstandards in the robotics industry, any previous experiencethat participants might have had, could have impacted theirexpectations while performing the study. Predictive codingand predictive delay can both affect a users experience for thebetter or worse [17, 20]. It could be beneficial to thoroughlystudy what people are expecting of an interaction with arobot or humanoid robot. For example, are people expectingthat humanoid robots should behave like humans, do peopleexpect it for certain tasks or perhaps when situated in certainenvironments? In order to design gestures that will improvethe user experience it would be beneficial to first understandthe underlying expectations of the users. It could also bebeneficial to study whether or not previous interactions withother more, or less, human-like robots could affect expectedrobot interactions with Pepper.

9

Page 13: A Pondering Robot › e620 › 0d083ad505ad423fc53937a… · user experience. Research on loading time for humanoid robots has so far mainly focused on the effects of auditory processing

6 CONCLUSIONSThe gestures used in this study did not show improvementon the overall user experience. The implemented gesturesinstead had a negative effect. Some factors contributed fac-tors could for example be previous experiences of interactingwith humanoid robots or the participants stress levels whenconducting the test.The gestures group gazed away around two times the

amount, and spent 1.5 times more longer on the interaction,as the static group did. The time spent on the interactioncould correlate with the amount the users looks away fromthe robot. Users stress-levels could also correlate with theamount they look away from the robot.

Previous studies conducted on robot loading interactionsstudied more exaggerated movements in combination withconversational fillers, which resulted in improved used expe-rience. The more subtle gestures used in this study did notshow improvements in user experience. A reason could bethat the gestures did not match the participants perceptualexpectations during their interactions or be an effect of theuncanny valley. Therefore, it could instead be beneficial toconduct further studies using exaggerated movements withand without conversational fillers.For future studies, conducting a study in a controlled en-

vironment would be beneficial to minimize the participantsstress levels to evaluate if it affects their patience for the ro-bot and their overall user experience with loading gestures.Testing other gestures, for example more exaggerated orcomical gestures, could affect the user experience differently.

7 ACKNOWLEDGEMENTI want to send special thanks to the people that have helpedme during this project. Many thanks to my wonderful su-pervisor, Björn Thuresson, for your guidance and wisdom.Thank you to the two creative and inspiring theatre profes-sionals, Ulf Wahlström Rönell and Martin Hasselgren. I wantto thank Julian for helping me program Pepper. Lastly, thankyou Kerry and Lisa for supporting me at the bank.

BIBLIOGRAPHY[1] Geoffrey W Beattie. 1979. Planning units in spontaneous speech:

Some evidence from hesitation in speech and speaker gaze directionin conversation. Linguistics 17, 1-2 (1979), 61–78.

[2] Tammy Everts. 2016. Time is Money: The Business Value of Web Perfor-mance. " O’Reilly Media, Inc.".

[3] Richard Langton Gregory. 1980. Perceptions as hypotheses. Philosoph-ical Transactions of the Royal Society of London. B, Biological Sciences290, 1038 (1980), 181–197.

[4] Takayuki Kanda, Masahiro Shiomi, Zenta Miyashita, Hiroshi Ishiguro,and Norihiro Hagita. 2009. An affective guide robot in a shopping mall.In Proceedings of the 4th ACM/IEEE international conference on Humanrobot interaction. ACM, 173–180.

[5] Adam Kendon. 1967. Some functions of gaze-direction in social inter-action. Acta psychologica 26 (1967), 22–63.

[6] Chao Liu, Ryen W. White, and Susan Dumais. 2010. UnderstandingWeb Browsing Behaviors Through Weibull Analysis of Dwell Time. InProceedings of the 33rd International ACM SIGIR Conference on Researchand Development in Information Retrieval (SIGIR ’10). ACM, New York,NY, USA, 379–386. https://doi.org/10.1145/1835449.1835513

[7] Karl F MacDorman. 2005. Androids as an experimental apparatus:Why is there an uncanny valley and can we exploit it. In CogSci-2005workshop: toward social mechanisms of android science, Vol. 106118.

[8] Howard Maclay and Charles E Osgood. 1959. Hesitation phenomenain spontaneous English speech. Word 15, 1 (1959), 19–44.

[9] Noriaki Mitsunaga, Christian Smith, Takayuki Kanda, Hiroshi Ishiguro,and Norihiro Hagita. 2008. Adapting robot behavior for human–robotinteraction. IEEE Transactions on Robotics 24, 4 (2008), 911–916.

[10] Chika Nagaoka. 2003. Mutual congruence of vocal behaviour in cooper-ative dialogues; comparison between receptive and assertive dialogues.Human Interface, 2003 (2003), 167–170.

[11] Naoki Ohshima, Keita Kimijima, Junji Yamato, and Naoki Mukawa.2015. A conversational robot with vocal and bodily fillers for recov-ering from awkward silence at turn-takings. In 2015 24th IEEE Inter-national Symposium on Robot and Human Interactive Communication(RO-MAN). IEEE, 325–330.

[12] Parmy Olson. 2018. Softbank’s Robotics Business Prepares To ScaleUp. (2018).

[13] Roy L. Patterson. 2016. The Impact of Web Performance on E-RetailSuccess. (2016).

[14] Samantha Persson. 2019. Improving perceived performance of loadingscreens through animation. (2019).

[15] Laura M Pfeifer and Timothy Bickmore. 2009. Should agents speak like,um, humans? The use of conversational fillers by virtual agents. InInternationalWorkshop on Intelligent Virtual Agents. Springer, 460–466.

[16] Aaron M Sackett, Tom Meyvis, Leif D Nelson, Benjamin A Converse,and Anna L Sackett. 2010. You’re having fun when time flies: Thehedonic consequences of subjective time progression. PsychologicalScience 21, 1 (2010), 111–117.

[17] Ayse Pinar Saygin, Thierry Chaminade, Hiroshi Ishiguro, JonDriver, and Chris Frith. 2011. The thing that should not be:predictive coding and the uncanny valley in perceiving hu-man and humanoid robot actions. Social Cognitive and Af-fective Neuroscience 7, 4 (04 2011), 413–422. https://doi.org/10.1093/scan/nsr025 arXiv:http://oup.prod.sis.lan/scan/article-pdf/7/4/413/27106769/nsr025.pdf

[18] Paula R Selvidge, Barbara S Chaparro, and Gregory T Bender. 2002. Theworld wide wait: effects of delays on user performance. InternationalJournal of Industrial Ergonomics 29, 1 (2002), 15–20.

[19] Toshiyuki Shiwa, Takayuki Kanda, Michita Imai, Hiroshi Ishiguro, andNorihiro Hagita. 2008. How quickly should communication robotsrespond?. In Proceedings of the 3rd ACM/IEEE international conferenceon Human robot interaction. ACM, 153–160.

[20] Roland Thomaschke and Carola Haering. 2014. Predictivity of systemdelays shortens human response time. International Journal of Human-Computer Studies 72, 3 (2014), 358–365.

[21] Cornelia S Wendt and Guy Berg. [n. d.]. Nonverbal humor as a newdimension of HRI. In RO-MAN 2009-The 18th IEEE International Sympo-sium on Robot and Human Interactive Communication. IEEE, 183–188.

[22] Noel Wigdor, Joachim de Greeff, Rosemarijn Looije, and Mark A Neer-incx. 2016. How to improve human-robot interaction with Conversa-tional Fillers. In 2016 25th IEEE International Symposium on Robot andHuman Interactive Communication (RO-MAN). IEEE, 219–224.

[23] Wenguo Zhao, Yan Ge, Weina Qu, Kan Zhang, and Xianghong Sun.2017. The duration perception of loading applications in smartphone:Effects of different loading types. Applied ergonomics 65 (2017), 223–232.

10

Page 14: A Pondering Robot › e620 › 0d083ad505ad423fc53937a… · user experience. Research on loading time for humanoid robots has so far mainly focused on the effects of auditory processing

How strongly do you agree/disagree with the following statements?

1. I was in a hurry to get here today

2. I feel that the check-in process was intuitive

3. M1 (the robot) felt human-like

4. The check-in process was quick

5. I would use M1(the robot) to check-in again

ID (5 letters located at the bottom of your badge): __________ Gender: ☐Male ☐Female ☐Unspecified Age: _____

Strongly agree

AgreeNeutralDisagreeStrongly disagree

☐ ☐ ☐ ☐ ☐

☐ ☐ ☐ ☐ ☐

☐ ☐ ☐ ☐ ☐

☐ ☐ ☐ ☐ ☐

☐ ☐ ☐ ☐ ☐

Why? Feel free to use the backside of

this paper for your answers.

____________________________

____________________________

____________________________

____________________________

____________________________

What did you like or dislike about M1?

__________________________________________________________________

___________________________________________________________________

Optional! Please leave your mail or phone nr if you are open to be contacted about your answers (anonymously of course).

____________________________________________

Master thesis survey about users’ experience of interacting with M1 (the robot)

A EVALUATION METHOD - SURVEYPast interaction questionnaire

Page 15: A Pondering Robot › e620 › 0d083ad505ad423fc53937a… · user experience. Research on loading time for humanoid robots has so far mainly focused on the effects of auditory processing

B FIRST SET OF GESTURES .

12

Page 16: A Pondering Robot › e620 › 0d083ad505ad423fc53937a… · user experience. Research on loading time for humanoid robots has so far mainly focused on the effects of auditory processing

C SECOND SET OF GESTURES

13

Page 17: A Pondering Robot › e620 › 0d083ad505ad423fc53937a… · user experience. Research on loading time for humanoid robots has so far mainly focused on the effects of auditory processing

www.kth.se

TRITA -EECS-EX-2020:14