Paper: Motion Segmentation and Recognition for Imitation ...

Takahashi, Y. et al.

Paper:

Motion Segmentation and Recognition for Imitation Learningand Influence of Bias for Learning Walking Motion

of Humanoid Robot Based on Human Demonstrated MotionYasutake Takahashi∗, Hiroki Hatano∗, Yosuke Maida∗∗, Kazuyuki Usui∗∗, and Yoichiro Maeda∗∗∗

∗Department of Human and Artificial Intelligent Systems, Graduate School of Engineering, University of Fukui3-9-1 Bunkyo, Fukui, Fukui 910-8507, Japan

E-mail: {yasutake, hhatano}@ir.his.u-fukui.ac.jp∗∗Department of Human and Artificial Intelligent Systems, Faculty of Engineering, University of Fukui

3-9-1 Bunkyo, Fukui, Fukui 910-8507, JapanE-mail: [email protected]

∗∗∗Department of Robotics, Faculty of Engineering, Osaka Institute of Technology5-16-1 Omiya, Asahi-ku, Osaka 535-8585, Japan

E-mail: [email protected][Received April 24, 2014; accepted May 27, 2015]

Two main issues arise in practical imitation learn-ing by humanoid robots observing human behavior –the first is segmenting and recognizing motion demon-strated naturally by a human beings and the second isutilizing the demonstrated motion for imitation learn-ing. Specifically, the first involves motion segmen-tation and recognition based on the humanoid robotmotion repertoire for imitation learning and the sec-ond introduces learning bias based on demonstratedmotion in the humanoid robot’s imitation learning towalk. We show the validity of our motion segmenta-tion and recognition in a practical way and report theresults of our investigation in the influence of learningbias in humanoid robot simulations.

Keywords: motion segmentation and recognition, imi-tation learning, learning bias, humanoid robot, via-pointrepresentation

1. Introduction

Humanoid robots made commercially show manykinds of behavior, mostly through graphical user inter-faces used to create new motion based on via-point rep-resentation. A user defines a set of via-points, each rep-resenting a set of joint angles defining the posture of ahumanoid robot. The robot follows via-points one byone to generate a specific motion. The process of de-signing and modifying motion using via-point represen-tation is, unfortunately, difficult and time-consuming, soimitation learning has draw attention [1–5]. These en-able users to add and design new robot behavior. Mostimitation learning for these robots focuses on mimick-ing the trajectory of joint angles for realizing the demon-strated behavior. This is insufficient, however, for im-itating demonstrated motion by following the joint an-

gle trajectory of demonstrated motion precisely becauserobot link structure and dynamics differ from those of ahuman demonstrator. Robot walking needs stability tokeep the robot upright while following the demonstratedtrajectory of joint angles. Overcoming the imitation gapbetween robots and human demonstrators requires addi-tional learning to adapt imitated motion to robot body dy-namics. Miyamoto and Kawato [1] show imitation learn-ing through via-point representation for a tennis serve andupswing of robot arm. Via-point representation is widelyused in may robots because it needs smaller parametersthan other representations representing complex motion.

Reinforcement learning has been studied widely inrobotics [6–8] because it enables a robot to acquire pur-posive behavior through trial and error. Applying simple,naive reinforcement learning to a robot, however, requireslong learning time and high cost making it is desirable tointroduce imitation into reinforcement learning to reducelearning time and unnecessary trial and error. Takahashiet al. [9] showed that a wheeled soccer robot enhancesbehavior acquisition and recognition based on the inter-action between behavioral learning and observation. Arobot learns several ways, including trial and error andreading rewards of observed teammate behavior. Hama-hata et al. [10] proposed designing reward system basedon observing a human demonstration and applying it tolearning on an inverted pendulum robot. These studiesuse reinforcement learning based on estimated state val-ues. Value-based reinforcement learning, however, mayinvolve issues such as the curse of dimensionality and rel-atively long learning time.

Policy gradient methods without state value estimationhave the advantages of handling continuous state space,a high degree of freedom and a short learning time, al-though they do not have guarantee of convergence to theoptimal policy. Kohl and Stone [11] shows the validityof the policy gradient method with a real 4-legged robot.Conventional approaches based on policy gradient based

532 Journal of Advanced Computational Intelligence Vol.19 No.4, 2015and Intelligent Informatics

Motion Segmentation and Imitation Learning of Humanoid Robot

reinforcement learning for imitation use a demonstratedmotion trajectory to initialize the motion parameters to belearned. It is important to find good initial parameterswhen setting an optimal policy within reasonable learn-ing time. Takahashi et al. [12, 13] show that an inverted-pendulum mobile robot acquires a good set of initial pa-rameters from observing a demonstration by a human be-ing for a robot learning a kicking motion. Miyamotoand Kawato [1] used a demonstrated trajectory as an ini-tial parameter set of via-points for learning, but it is notclear how a demonstrated motion affected learning per-formance if it is used as a bias of reinforcement learningas imitation learning rather than as initial learning param-eters.

A further point in imitation learning is available train-ing data. Most imitation learning studies [2–5] assumethat demonstrator’s joint angles are available as trainingdata. Optical or mechanical motion capture is necessaryfor measuring joint angles of a demonstrator. This tends tobe expensive and time-consuming preparation, however.Powerful inexpensive motion capture systems are avail-able, such as KINECT from Microsoft Corporation. TheKINECT device measures the positions of human jointsbut does not provide joint angle information directly. It isdifficult to solve inverse kinematics in calculating jointangles from joint positions because of redundancy anddifferences in body link structure between human demon-strators and humanoid robots. Imitation learning shouldavoid unnecessary cost for solving inverse kinematics toacquire joint-angle trajectories from demonstrations. Im-itation learning in a practical situation does not require ahuman demonstrator to declare the start and the end of aparticular motion and the robot must segment and recog-nize motion itself.

This paper proposes two methods for practical imita-tion learning: (1) motion segmentation and recognitionbased on the motion repertoire of the humanoid robot forimitation learning, and (2) imitation learning based ona reasonable bias for a humanoid robot with policy gra-dient based reinforcement learning. (1) recognizes boththe motion that the demonstrator shows and the cycle pe-riod and phase if a human demonstrator shows a periodicmotion. (2) required that we investigate learning perfor-mance based on bias of motion automatically segmentedin (1). A humanoid robot observes a human walking withmotion capture and acquires the joint trajectory. Mo-tion data based on the motion repertoire of the humanoidrobot extracts only a walking sequence for imitation learn-ing. Learning bias is designed with body link posturetrajectories calculated by joint trajectory. Learning per-formance is investigated by strictly checking the weightof learning bias during imitation learning. Results of ex-periments showed that motion segmentation operates ap-propriately and indicates that learning bias contributes tobetter learning parameter exploration and motion stabilityduring learning.

Wave hand

Fig. 1. Imitation learning flow.

2. Overview

Figure 1 shows the imitation learning flow. The hu-manoid robot first observes a human demonstration witha motion capture device, such as the KINECT sensor. Thedevice records a sequence of a human demonstrator’s jointpositions in camera coordinates. Because it is difficultto estimate joint angles uniquely from information on thejoint positions of the body, link postures are used to com-pare human and robot motion.

The recorded sequence of such link postures includesmotions such as standing, walking, bowing, and handwaving. This sequence is divided into motions based onrobot motion repertories in which individual motions aredefined using the via-point representation of joint angles.Motion segmentation is based on difference in link pos-ture sequences between the human being and the robot.Motion that shows the smallest difference is labeled inthe sequence of the human motion demonstrated.

The segmented sequence of human link postures is in-troduced in motion learning as a bias for imitating behav-ior. Basically, reinforcement learning is applied to im-prove motion using the body dynamics of the robot itselfby updating via-points of joint motion angles. The learn-ing bias derived from the segmented human motion se-quence is used to improve learning performance similarto the way in which a trainer shows a model motion to atrainee.

3. Motion Recognition and Segmentation

A learning robot must recognize and segment observedmotion demonstrated by a human being to choose whichmotions to learn as shown in Fig. 2 overview. We assumethat our humanoid robot has a set of basic motions to beimproved by imitation learning. The robot has 5 motions– standing, walking, bowing, waving the right hand, and

Vol.19 No.4, 2015 Journal of Advanced Computational Intelligence 533and Intelligent Informatics


theta

time

theta

timetheta

time

Fig. 2. Motion recognition system.

Fig. 3. Joints detected by KINECT and links whose posturesare calculated by joints.

waving the left hand – each of which is described usingvia-point representation, and the sequence of link posturesduring motion is recorded beforehand. One of the mod-ules shown in Fig. 2 is assigned to one motion and its linkposture sequecne.

The motion capture system is used to observe the hu-man demonstration, measures human joint positions, andconverts position data to the robot coordinate system,making the link posture sequence during the demonstratedmotion available. The link posture sequence is comparedto that one of each robot motion and the similarity oflink posture sequence is calculated. The motion with thebiggest similarity is regarded as the motion demonstratedby the human being.

3.1. Link Posture EstimationIt is difficult to determine joint angles from the se-

quence of observed joint positions because of redundancyin degrees of freedom. It takes time to find a reason-able joint angle sequence because the exploration spaceis huge. To avoid the issue, we use link posture estima-tion instead of joint angle estimation because link postureis calculated uniquely without the redundancy problem.

Figure 3 shows joints detected by KINECT and linkswhose postures are calculated with joints. Our KINECT-based motion capture system detects 16 joints. Links aredefined and conducted using adjacent joints as shown inFig. 3.

Fig. 4. Camera coordinate system cΣ and robot coordinatesystem rΣ.

We define camera and robot coordinate systems asshown in Fig. 4. The motion capture system recodes jointpositions in the camera coordinate system that are thentranslated in the robot coordinate system. The center ofthe robot coordinate system is the center of hip joints,HIP L and HIP R. The z direction in the robot coordi-nate system is vertical upright, where as the x direction isdefined as the direction from joint HIP L to joint HIP Rignoring the z direction. The y direction of the robot co-ordinate system is defined as the cross product of z and x,that is, y = z× x.

A link posture in the robot coordinate system is definedwith the two adjacent joints shown in Fig. 3. The postureof link i consists of joints l and m as defined in Eq. (1).

ϕϕϕ i =pppl− pppm

||pppl− pppm||. . . . . . . . . . . . (1)

where pppl and pppm are joint l and m positions in the robotcoordinate system. || · || indicates Euclidean norm. Exper-iments that follow use the 16 joints and 11 links shown inFig. 3.

3.2. Similarity Based on Link Posture

The link postures sequence of human demonstratorhϕϕϕ i(t) is recorded by the motion capture system. Robotrϕϕϕ i(t) is recorded while the robot is actually executing themotion based on the joint trajectory represented by via-points. The similarity of link postures hϕϕϕ i(t) and rϕϕϕ i(t)is defined as the inner product of these postures, that is,hϕϕϕ i(t) · rϕϕϕ i(t). The similarity of the two link posture se-quences of the human demonstrator and the robot is de-fined as follows.

S =1

NT

T

∑t=1

N

∑i=1

hϕϕϕ i(t) · rϕϕϕ i(t) . . . . . . . (2)

where N and T indicate the number of links and size ofthe sequence.



ϕ( )ϕ( τ)ϕ( )

τ(a)

G(τ)

τ(b)

Fig. 5. Schematic example of observed motion data se-quence demonstrated by (a) a human being and (b) autocor-relation function G(τ) of sequences.

3.3. Periodic Motion DetectionThe human demonstrator often shows periodic motion

such as, walking, or waving hands. A sequence of motiondata must be segmented into one cycle and the phase andperiod should be defined. The autocorrelation functionreasonably finds the demonstrated motion period. The au-tocorrelation function in demonstrated motion G(τ) is asfollows:

G(τ) =1

NT

T−1

∑t=0

N−1

∑i=0

hϕϕϕ i(t) · hϕϕϕ i(t + τ) . . . (3)

τ indicates the shift time, T the length of the window forcalculating the autocorrelation, and N the number of links.Fig. 5(a) shows a schematic example of the sequence ofthe observed motion demonstrated by a human being. Theblack line indicates the original sequence of the observedposture of a link. The grey line shows the sequence shiftedwith time τ . The autocorrelation function shows a localmaximum for τ that is the period of the demonstratedmotion as shown in Fig. 5(b). τ showing the first localmaximum of G(τ) excluding τ = 0 is defined as the cycleperiod of demonstrated motion τm.

The phase is important for recognizing the state of mo-tion correctly. Phase α is calculated using the correlationbetween human and robot motions with time shift α . Thecorrelation function is shown in Eq. (4).

S(α) =1

NT

T−1

∑t=0

N−1

∑i=0

hϕϕϕ i(t) · rϕϕϕ i

( τP

t +α)

. . (4)

P indicates the period of one-cycle robot motion. Fig. 6(a)shows an example of sequences of one cycle of observed

α τ

φ(t)

t

rφ( t+α)τP

(a)

α

S( )α

(b)

Fig. 6. Schematic example of (a) sequences of one cycleof observed motion data and robot motion data and (b) thesimilarity function based on α .

human motion data and robot motion data. Similarityfunction S(α) is maximum if phase α matches the twomotion data as shown in Fig. 6(b). Phase α showing max-imum S(α) is regarded as the demonstrated motion phaseαm.

3.4. Trajectory Based on Via-Point Representation

The data size for representing the trajectory of raw jointangles of one motion is too huge to learn, so via-pointrepresentation is widely used to reduce learning param-eter size [1]. Fortunately, many commercially availablehumanoid robots, including Robovie-PC produced by Vs-tone Co., Ltd. [14] which we use in our experiments, usevia-point representation to design motion. Fig. 7 showsan example of via-point representation for walking. Eachnumber in the figure indicates the index of a via-point.Humanoid skeletons adjacent to numbers indicate robotpostures at via-points. A set of via points represents pointson the trajectory of motion. A point between via-points isinterpolated with, for example, spline interpolation or cu-bic curve interpolation. We use cubic curve interpolationin this paper because our robot uses it.

One motion is defined by a set of via-points. Via-pointθθθ t = (θ 1

t ,θ 2t , . . .) indicates a set of joint angles of the body

at time t. The walking we design for our experiments, forexample, has 15 via-points. Time t is defined for eachvia-point to control motion speed.

Figure 8 shows a joint angle trajectory represented byvia-points. The trajectory is the actual angle of the rightwaist joint of the humanoid robot while it walks two steps.Black dots indicate via-points and the solid line indicatesinterpolated joint angles.



StartStart

End

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16 17 18 19 End

Fig. 7. Example of via-point representation for walking:Individual numbers indicate indexes of the via-point. Hu-manoid skeletons indicate robot posture at via-points.

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0 2 4 6 8 10 12 14 16

join

t ang

le[r

ad]

time[s]

viapoint joint trajectory

Fig. 8. Trajectory of a joint angle represented by via-pointsduring walking.

θ i(t) = ait3 +bit2 + cit +di . . . . . . . . (5)

θ̇ i(t) = 3ait2 +2bit + ci . . . . . . . . . (6)

Eqs. (5) and (6) show a cubic curve line and its deriva-tive. Two adjacent via-points at times t1 and t2 maintainspecific angles θθθ t1 and θθθ t2 . Coefficients ai, bi, ci and di

are calculated by Eq. (7) if time (t1, t2), angles (θ it1 , θ i

t2 ),and angular velocities (θ̇ i

t1 , θ̇ it1 ) of the ith joint at the two

via-points are given.⎛⎜⎜⎜⎝

ai

bi

ci

di

⎞⎟⎟⎟⎠ =

⎛⎜⎜⎜⎝

t31 t2

1 t1 13t2

1 2t1 1 0t32 t2

2 t2 13t2

2 2t2 1 0

⎞⎟⎟⎟⎠

−1 ⎛⎜⎜⎜⎜⎝

θ it1

θ̇ it1

θ it2

θ̇ it2

⎞⎟⎟⎟⎟⎠

(7)

4. Imitation Learning

We use a simple policy gradient method introduced byKohl and Stone [11] to improve walking by updating via-

points that represents walking. It has an initial set of via-points representing its original walking and prepares testsets of similar via-points by adding small disturbances.It evaluates the test set of via-points one by one and es-timates the gradient of the evaluation of walking basedon via-points, then it updates via-points based on the es-timated gradient of the evaluation. The learning systemrepeats this procedure and updates motion parameters toreach the local maximum of evaluation.

4.1. Policy Gradient MethodA set of via-points for the current motion is Θ =

(θθθ 1,θθθ 2, . . .) = (θ 11 ,θ 2

1 , . . . ,θ jk , . . .), where the k and j

indicate indices of via-points and joints. The learn-ing system prepares M motion sets of similar via-points1Θ,2 Θ, ...,M Θ by adding small disturbances ε j

k , 0, or−ε jk

to the current angle of motion via-point Θ.mΘ = (mθθθ 1,

m θθθ 2, . . .) = (mθ 11 ,m θ 2

1 , . . .) . . . (8)mθ j

k = θ jk + rε j

k where r ∈ (−1,0,1) . . . . (9)

It evaluates sets of via-points mΘ one by one after therobot generates walking based on the set of via-points.

Evaluation averages for disturbance ε jk are estimated as

follows:

• A jk,+ε is average of evaluation of policies which pa-

rameter mθ jk is θ j

k + ε jk

• A jk,0 is average of evaluation of policies which pa-


k

• A jk,−ε is average of evaluation of policies which pa-


k − ε jk

The set of via-points for walking Θ based on the aver-ages as follows:

θ jk ←

⎧⎪⎪⎨⎪⎪⎩

θ jk +η j

k A jk,+ε > A j

k,0 and A jk,−ε

θ jk −η j

k A jk,−ε > A j

k,0 and A jk,+ε

θ jk else

. (10)

η jk is the update step size for parameter θ j

k . The learn-ing system repeats this procedure and updates the set ofvia-points for motion to reach the local maximum of eval-uation.

4.2. Evaluation of MotionEvaluation of learning walking is designed based on

three points – how fast the robot walks, the stability ofwalking, and how similar the walking is to demonstratedwalking. The similarity to demonstrated walking Sm isdefined based on the similarity S in Eq. (2) with the cyclicperiod and the phase of demonstrated motion τm and αm.Walking distance D and similarity Sm are measured ateach trial. Evaluation for learning EL is defined as fol-lows:

EL = wSSm +wDD−PL . . . . . . . . . (11)



0

1

2

3

4

0 5 10 15 20 25 30 0

2

4

6

8

Per

iod

of c

ycle

[s]

Pha

se [r

ad]

time [s]

Period of cyclePhase

fast medium slow

(a) From fast walking to slow

0

1

2

3

4

0 5 10 15 20 25 30 0

2

4

6

8

Per

iod

of c

ycle

[s]

Pha

se [r

ad]

time [s]

Period of cyclePhase

fastmediumslow

(b) From slow walking to fast

Fig. 9. Recognition of the period of the cycle and phase ofobserved human walking motion.

where wS and wD are weights for similarity and walkingdistance. We set a constraint on weights so that wS +wD =1. PL is a penalty against falling down during walking andzero is for when the robot does not fall down in the trial.Similarity Sm works as a bias for imitation learning andthe degree of bias is controlled by weight wS.

Evaluation for learning EL includes similarity todemonstrated walking as a bias of imitation learning. Toevaluate the performance of the learned walking itself, an-other evaluation EM is defined as follows:

EM = wDD−PM . . . . . . . . . . . . (12)

PM is a penalty against falling during walking. EvaluationEM shows the pure performance of learned walking thatdoes not consider similarity between robot walking anddemonstrated walking.

5. Experiments

We uses the humanoid robot (Robovie-PC) simulatorthat we developed instead of a real robot for exhaustiveexploration based on the weight of learning bias duringimitation learning.

5.1. Motion RecognitionFigure 9 shows recognition of the cyclic period and

phase of observed human walking motion. A humandemonstrator shows walking at various speeds, walkingfast at the beginning and slowing down every 10 seconds,first (Fig. 9(a)). The human demonstrator then showsslow walking and speeds up every 10 seconds (Fig. 9(b)).Recognition results show that the period of cycle τm andphase αm of walking is detected well enough to imitate.

Figure 10 shows an example of motion recognition

0.4

0.5

0.6

0.7

0.8

0.9

1

0 100 200 300 400 500 600

S

frames

WalkingBowing

StandingWaving right hand

10 20

Fig. 10. Example of motion recognition and segmentation.

and segmentation while the human demonstrator stands,waives his right hand, walks, bows, and stands again. Hu-man motion is recognized successfully.

5.2. Imitation LearningA human demonstrator shows walking at normal speed

to the humanoid robot. The learning robot segments thedemonstrated motion. Walking is selected for imitationlearning, here. The humanoid robot starts learning walk-ing on its own based on the policy gradient method inSection 4.1 with evaluation EL defined in Section 4.2 in-cluding learning bias based on demonstrated walking. Ini-tial via-point parameters for walking are original ones ofRobovie-PC walking. To investigate how learning bias ef-fects learning, the weight for similarity wS is varied from0.0 to 1.0. If wS = 0.0, the robot ignores the learning biasof imitation and tries to maximize walking distance. IfwS = 1.0, the robot ignores walking distance and tries toimitate demonstrated walking.

The robot walks 10 steps in one trial. The size of pa-rameter sets M is 100 to update parameters of via-pointsfor walking in this experiment. The number of updates isempirically set to 100 so that learning converges.

A conventional method is conducted to evaluate theproposed method. The comparative conventional methodis conducted so that it uses motion demonstrated by a hu-man being to set initial parameters of the via-points forwalking, then updates parameters through trial and erroras follows: the best joint parameters of each via-point formimicking human demonstrated walking are searched forthrough trial and error which takes 10,000 trials. It thenstarts to learn walking based on the same learning methodas our proposed method but without the learning bias ofimitation wS = 0.0.

Figures 11 to 14 show results of experiments. Eachlearning curve is an average of 5 learnings. Fig. 11 showsthe average walking distance during learning with differ-ent weights of similarity wS and the conventional method.



0.2

0.4

0.6

0.8

1

1.2

1.4

0 20 40 60 80 100

dist

ance

number of updates

wS = 0.0wS = 0.3wS = 0.6wS = 0.9wS = 1.0conventional

Fig. 11. Average walking distance during learning.

0

500

1000

1500

2000

0 10 20 30 40 50 60 70 80 90 100

num

ber

of fa

lls

number of updates

wS = 0.0wS = 0.3wS = 0.6wS = 0.9wS = 1.0

conventional

(a) Comparison to conventional method

0

20

40

60

80

100

120

140

160

180

0 10 20 30 40 50 60 70 80 90 100

num

ber

of fa

lls

number of updates

wS =0.0

wS =0.6wS =0.3

wS =1.0wS =0.9

(b) Comparison to various wS

Fig. 12. Incremental times of falling during learning.

All learning curves except the those with wS = 1.0 andthe conventional method are similar. Learning curveswith 0.0 < wS < 1.0 shows better performance than curveswS = 0.0 or wS = 1.0. wS = 1.0 indicates that wD = 0.0and the learning robot tries to imitate demonstrated walk-ing ignoring walking distance. It shows a shorter walk-ing distance than the others, but still shows better perfor-mance than at the beginning of learning. wS = 0.0 indi-cates that the robot ignores the learning bias of imitationand just tries to maximize walking distance. Nonetheless,

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

0 10 20 30 40 50 60 70 80 90 100

aver

age

eval

uatio

n va

lue

of m

otio

n

number of updates

wS =0.0

wS =0.6wS =0.3

wS =1.0wS =0.9

Fig. 13. Evaluation of walking EM during learning.

0.21

0.215

0.22

0.225

0.23

0.235

0 10 20 30 40 50 60 70 80 90 100

sim

iarit

y

number of updates

WS =0.0

WS =0.6WS =0.3

WS =1.0WS =0.9

Fig. 14. Similarity of walking S during learning.

it shows worse performance than others with wS greaterthan 0. The weight of similarity wS = 0.6 shows the bestlearning performance in experiments. The conventionalmethod actually shows faster walking than the proposedmethod. Note, however, that learning started after the bestparameters for mimicking human demonstrated walkingare found through 10,000 trials.

Figure 12 shows the cumulative number of falls duringlearning. Fig. 12(a) shows results with the conventionaland proposed methods. The conventional method showsa much bigger number of falls during learning, whichmeans that the conventional method shows an unstablewalking pattern during learning. The proposed methodshows many fewer falls during learning, which means thatwalking during learning is stable. This is one of the bigadvantages of our proposed method.

Figure 12(b) shows results without the conventionalmethod. If the learning robot ignores the bias of imita-tion with motion similarity Sm, it walks unstably and oftenfalls. It actually shows the worst stability during learningif it ignores the bias of imitation by wS = 0.0. The learningrobot with weight of similarity wS = 1.0 shows the beststability performance, rarely falling during learning. If theweight of similarity increases, stability tends to be high.One exception, however, wS = 0.9 shows worse perfor-mance than wS = 0.6. The weight of similarity wS = 0.6



shows the best walking stability during learning in exper-iment.

Figure 13 shows the evaluation of motion without theweight of similarity, EM , defined by Eq. (12). Weight ofsimilarity wS = 0.6 shows a stable learning curve whereaswS = 0.0 shows frequent falls during learning because itis often given a penalty for falling.

Figure 14 shows walking similarity S defined by Eq.(2) during learning. All similarity curves except wS = 1.0decrease during learning, indicating that walking tends tobe affected by the evaluation of walking distance ratherthan similarity of walking. Fig. 11 shows that learningbias based on walking similarity leads to better walkingparameters. Fig. 12 shows that bias contributes to walkingstability during learning, so learning bias based on walk-ing similarity contributes to better parameter search andmotion stability during learning, although the evaluationof motion similarity decreases in learning.

5.3. Real Robot Experiments in Imitation LearningWe conducted real robot experiments to show the va-

lidity of our proposed imitation learning method. Figs. 15and 16 show typical sequence of real robot walking be-fore and after imitation learning. Walking before learningis stable but slower (Fig. 15). Walking learned based onwalking distance and imitation similarity, ws = 0.9, showsfaster walking while also being slightly unstable.

Figure 17 shows a sequence of real robot walkinglearned based on distance walking alone without the im-itation of human walking. In general, this walking is toounstable to continue, although walking distance is longerthan that of others. The upper body tends to swing backand forth using dynamics for faster walking but it almostfalls down.

Figure 18 shows the sequence of real robot walkinglearned based on only similarity of imitation in whichthe walking distance is not considered. Despite this, itshows faster walking than the original while showing sta-ble walking. Walking speed is slightly slower than thatlearned with walking distance and imitation similarity inreal robot experiments. Simulation results show a signif-icant difference between them, possibly do to the differ-ence in friction between the foot and floor and/or the dif-ferent characteristic of servo-motor dynamics of joints.

6. Conclusions and Future Work

This paper has proposed two methods for humanoidrobot imitation learning with via-point representation.One involves motion recognition and segmentation forimitation learning and recognizes both motion itself andthe period and phase of the periodic motion cycle. Theother involves imitation learning that introduces the biasof the similarity of motion demonstrated by a human be-ing. We investigated learning performance based on theweight of bias. Results of experiments showed that thefirst method successfully recognized and segmented the

(a) 0.0 s

(b) 1.6 s

(c) 3.2 s

(d) 4.8 s

(e) 6.4 s

(f) 8.0 s

Fig. 15. Real robot walking pattern sequence before learning.



(a) 0.0 s

(b) 1.6 s

(c) 3.2 s

(d) 4.8 s

(e) 6.4 s

(f) 8.0 s

Fig. 16. Real robot walking pattern sequence after learningbased on the evaluation of both walking distance and imita-tion similarity.

demonstrated motion with period and phase informationon the periodic motion cycle and that learning bias con-tributes to better parameter search and learning motion

(a) 0.0 s

(b) 1.6 s

(c) 3.2 s

(d) 4.8 s

(e) 6.4 s

(f) 8.0 s

Fig. 17. Real robot walking pattern sequence after learningbased on the evaluation of walking distance alone.

stability during learning.Future work should extend the proposed method to the

imitation learning of new motion even though the method



(a) 0.0 s

(b) 1.6 s

(c) 3.2 s

(d) 4.8 s

(e) 6.4 s

(f) 8.0 s

Fig. 18. Real robot walking pattern sequence after learningbased on the evaluation of imitation similarity alone.

we proposed in this paper focuses on motion of the robotas one of motion repertory.

References:[1] H. Miyamoto and M. Kawato, “A tennis serve and upswing learn-

ing robot based on bi-directional theory,” Neural Networks, Vol.11,No.78, pp. 1331-1344, 1998.

[2] T. Inamura, Y. Nakamura, and I. Toshima, “Embodied symbolemergence based on mimesis theory,” Int. J. of Robotics Research,Vol.23, No.4, pp. 363-377, 2004.

[3] M. K. Y. Okuzawa, S. Kato and H. Itoh, “Acquisition and modi-fication of motion knowledge using continuous hmms for motionimitation of humanoids,” IEEE Int. Symp. on Micro-Nano Mecha-tronics and Human Science, pp.586-591, 2009.

[4] Y. Okuzawa, S. Kato, M. Kanoh, and H. Itoh, “Motion recogni-tion and modifying motion generation for imitation robot based onmotion knowledge formation,” The Trans. of the Institute of Elec-trical Engineers of Japan: C, A Publication of Electronics, Infor-mation and System Society, Vol.131, pp. 655-663, March 2011 (inJapanese).

[5] Y. Okuzawa, S. Kato, M. Kanoh, and H. Itoh, “Motion recognitionand modifying motion generation for imitation robot based on mo-tion knowledge formation,” IEEJ Trans. on Electronics, Informationand Systems, Vol.131, No.3, pp. 655-663, 2011.

[6] S. Whitehead, J. Karlsson, and J. Tenenberg, “Learning multiplegoal behavior via task decomposition and dynamic policy merging,”in ROBOT LEARNING (J. H. Connell and S. Mahadevan, eds.),Ch.3, pp. 45-78, Kluwer Academic Publishers, 1993.

[7] M. Asada, S. Noda, S. Tawaratumida, and K. Hosoda, “Purposivebehavior acquisition for a real robot by vision-based reinforcementlearning,” Machine Learning, Vol.23, pp. 279-303, 1996.

[8] S. Schaal, A. Ijspeert, and A. Billard, “Computational approaches tomotor learning by imitation,,” pp. 199-218, No.1431, Oxford Uni-versity Press, 2004.

[9] Y. Takahashi, Y. Tamura, M. Asada, and M. Negrello, “Emulationand behavior understanding through shared values,” Robotics andAutonomous Systems, Vol.58, pp. 855-865, July 2010.

[10] K. Hamahata, T. Taniguchi, K. Sakakibara, I. Nishikawa, K.Tabuchi, and T. Sawaragi, “Effective integration of imitation learn-ing and reinforcement learning by generating internal reward,” Proc.of the 2008 8th Int. Conf. on Intelligent Systems Design and Appli-cations, Vol.3, ISDA’08, Washington, DC, USA, pp. 121-126, IEEEComputer Society, 2008.

[11] N. Kohl and P. Stone, “Policy gradient reinforcement learning forfast quadrupedal locomotion,” Proc. of the IEEE Int. Conf. onRobotics and Automation, pp. 2619-2624, 2004.

[12] S. Takahashi, Y. Takahashi, Y. Maeda, and T. Nakamura, “Kickingmotion imitation of inverted-pendulum mobile robot and develop-ment of body mapping from human demonstrator,” J. of AdvancedComputational Intelligence and Intelligent Informatics (JACIII),Vol.15, No.8, pp. 1030-1038, 2011.

[13] Y. Takahashi, T. Kimura, Y. Maeda, and T. Nakamura, “Body map-ping from human demonstrator to inverted-pendulum mobile robotfor learning from observation,” Proc. of WCCI 2012 IEEE WorldCongress on Computational Intelligence, pp. 945-950, 2012.

[14] Vstone Co., Ltd., “PRODUCTS — Vstone Co., Ltd.”Available at: http://www.vstone.co.jp/english/products.html.[Accessed July 3, 2015]



Name:Yasutake Takahashi

Affiliation:Department of Human and Artificial IntelligentSystems, Graduate School of Engineering, Uni-versity of Fukui

Address:3-9-1 Bunkyo, Fukui, Fukui 910-8507, JapanBrief Biographical History:2000-2009 Assistant Professor of Department of Adaptive MachineSystems, Graduate School of Engineering, Osaka University2006-2007 Visiting researcher at the Fraunhofer IAIS2009- 2012 Senior Assistant Professor of Department of Human andArtificial Intelligent Systems, Graduate School of Engineering, Universityof Fukui2012- Associate Professor of Department of Human and ArtificialIntelligent Systems, Graduate School of Engineering, University of FukuiMain Works:• Y. Takahashi, K. Noma, and M. Asada, “Efficient Behavior Learningbased on State Value Estimation of Self and Others,” Advanced Robotics,Vol.22, No.12, pp. 1379-1395, 2008.• Y. Takahashi, Y. Tamura, M. Asada, and M. Negrello, “Emulation andAction Understanding through Shared Values,” ROBOTICS ANDAUTONOMOUS SYSTEMS J., Vo.58, No.7, pp. 855-865, 2010.• Y. Takahashi, T. Ishii, C. Todoroki, Y. Maeda, and T. Nakamura, “FuzzyControl for a Kite-Based Tethered Flying Robot,” J. of AdvancedComputational Intelligence and Intelligent Informatics (JACIII), Vol.19,No.3, pp. 349-358, 2015.Membership in Academic Societies:• The Robotics Society of Japan (RSJ)• Japan Society for Fuzzy Theory and Intelligent Informatics (SOFT)• The Japanese Society for Artificial Intelligence (JSAI)• The Institute of Electrical and Electronics Engineers (IEEE)

Name:Hiroki Hatano


Address:3-9-1 Bunkyo, Fukui, Fukui 910-8507, JapanBrief Biographical History:2010-2014 Faculty of Engineering, University of Fukui2014- Graduate School of Engineering, University of FukuiMembership in Academic Societies:• Japan Society for Fuzzy Theory and Intelligent Informatics (SOFT)

Name:Yosuke Maida

Affiliation:Department of Human and Artificial IntelligentSystems, Faculty of Engineering, University ofFukui

Address:3-9-1 Bunkyo, Fukui, Fukui 910-8507, JapanBrief Biographical History:2010-2014 Faculty of Engineering, University of Fukui2014- MEITEC CORPORATION

Name:Kazuyuki Usui


Address:3-9-1 Bunkyo, Fukui, Fukui 910-8507, JapanBrief Biographical History:2009-2013 Faculty of Engineering, University of Fukui2013- Hokusin Co., Ltd.



Name:Yoichiro Maeda

Affiliation:Department of Manufacturing Technologists, In-stitute of Technologists

Address:333 Maeya, Gyoda City, Saitama 361-0038, JapanBrief Biographical History:1983- Researcher, Central Research Lab., Mitsubishi Electric Corp.1989-1992 Senior Researcher, Laboratory for International FuzzyEngineering Research (LIFE)1995- Associate Professor, Osaka Electro-Communication University1999-2000 Visiting Researcher, University of British Columbia (UBC),Canada2002- Associate Professor, Faculty of Engineering, University of Fukui2007- Professor, Graduate School of Engineering, University of Fukui2013- Professor, Faculty of Engineering, Osaka Institute of Technology2015- Professor, Department of Manufacturing Technologists, Institute ofTechnologistsMain Works:• Y. Maeda and Y. W. Shimizuhira, “Multi-Layered Fuzzy BehaviorControl for Autonomous Mobile Robot with Multiple OmnidirectionalVision System: MOVIS,” J. of Advanced Computational Intelligence andIntelligent Informatics, Vol.11, No.1, pp. 21-27, 2007.• Y. Maeda and R. Taki, “Interactive Emotion Communication BetweenHuman and Robot,” Int. J. of Innovative Computing, Information andControl, Vol.7, No.5 (B), pp. 2961-2970, 2011.•M. Wang, Y. Maeda, and Y. Takahashi, “Visual Attention RegionPrediction Based on Eye Tracking Using Fuzzy Inference,” J. of AdvancedComputational Intelligence and Intelligent Informatics (JACIII), Vol.18,No.4, pp. 499-510, 2014.Membership in Academic Societies:• The Society of Instrument and Control Engineers (SICE)• The Robotics Society of Japan (RSJ)• Japan Society for Fuzzy Theory and Intelligent Informatics (SOFT)• The Japanese Society for Artificial Intelligence (JSAI)• Japan Society of Kansei Engineering (JSKE)


Paper: Motion Segmentation and Recognition for Imitation ...

Documents

Transcript of Paper: Motion Segmentation and Recognition for Imitation ...