Learning from demonstration and adaptation of biped …cga/papers/nakanishi-ras04.pdfpatterns can be...

13
Robotics and Autonomous Systems 47 (2004) 79–91 Learning from demonstration and adaptation of biped locomotion Jun Nakanishi a,b,, Jun Morimoto a,b , Gen Endo a,c , Gordon Cheng a,b , Stefan Schaal a,d , Mitsuo Kawato a,b a ATR Computational Neuroscience Laboratories, 2-2 Hikaridai, Seika-cho, Soraku-gun, Kyoto 619-0288, Japan b ICORP, Japan Science and Technology Agency, 2-2 Hikaridai, Seika-cho, Soraku-gun, Kyoto 619-0288, Japan c Sony Corporation, 6-7-35 Kitashinagawa, Shinagawa-ku, Tokyo 141-0001, Japan d Department of Computer Science, University of Southern California, 3641 Watt way, Los Angeles, CA 90089-2520, USA Abstract In this paper, we introduce a framework for learning biped locomotion using dynamical movement primitives based on non-linear oscillators. Our ultimate goal is to establish a design principle of a controller in order to achieve natural human-like locomotion. We suggest dynamical movement primitives as a central pattern generator (CPG) of a biped robot, an approach we have previously proposed for learning and encoding complex human movements. Demonstrated trajectories are learned through movement primitives by locally weighted regression, and the frequency of the learned trajectories is adjusted automatically by a novel frequency adaptation algorithm based on phase resetting and entrainment of coupled oscillators. Numerical simulations and experimental implementation on a physical robot demonstrate the effectiveness of the proposed locomotion controller. © 2004 Elsevier B.V. All rights reserved. Keywords: Biped locomotion; Learning from demonstration; Dynamical movement primitives; Phase resetting; Frequency adaptation 1. Introduction There has been a growing interest in biped loco- motion with the recent development of advanced hu- manoid robots. Many of existing successful walking algorithms use the zero moment point (ZMP) crite- rion [1] for off-line motion planning [2,3] and on-line balance compensation [4–6]. These ZMP-based meth- ods have been shown to be effective to achieve biped locomotion in legged robots with flat feet. However, they require precise modeling of the robot dynamics and high-gain trajectory tracking control, and the gen- erated patterns result in a typical ‘bent-knee’ posture to avoid singularities of inverse kinematics. From the Corresponding author. E-mail address: [email protected] (J. Nakanishi). viewpoint of energy efficiency, such walking patterns are not desirable since torque must be continuously applied to the knee joint to maintain a bent-knee posture. These previous ZMP approaches have pri- marily focused on the ability of executing planned movements at any instance by ensuring surface con- tact between the sole and the ground [7] rather than natural human-like motion which exploits passive dynamics of the body [8]. In contrast to off-line trajectory planning, biologi- cally inspired control approaches based on central pat- tern generators (CPGs) with neural oscillators have been drawing much attention for rhythmic motion gen- eration. As a CPG, a neural oscillator proposed by Matsuoka [9] is widely used, which models the fir- ing rate of two mutually inhibiting neurons described in a set of differential equations. This model is used 0921-8890/$ – see front matter © 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.robot.2004.03.003

Transcript of Learning from demonstration and adaptation of biped …cga/papers/nakanishi-ras04.pdfpatterns can be...

Page 1: Learning from demonstration and adaptation of biped …cga/papers/nakanishi-ras04.pdfpatterns can be easily modified by scaling the param-eters r0, ω(= 1/τ) and ym individually.

Robotics and Autonomous Systems 47 (2004) 79–91

Learning from demonstration and adaptation of biped locomotion

Jun Nakanishia,b,∗, Jun Morimotoa,b, Gen Endoa,c, Gordon Chenga,b,Stefan Schaala,d, Mitsuo Kawatoa,b

a ATR Computational Neuroscience Laboratories, 2-2 Hikaridai, Seika-cho, Soraku-gun, Kyoto 619-0288, Japanb ICORP, Japan Science and Technology Agency, 2-2 Hikaridai, Seika-cho, Soraku-gun, Kyoto 619-0288, Japan

c Sony Corporation, 6-7-35 Kitashinagawa, Shinagawa-ku, Tokyo 141-0001, Japand Department of Computer Science, University of Southern California, 3641 Watt way, Los Angeles, CA 90089-2520, USA

Abstract

In this paper, we introduce a framework for learning biped locomotion using dynamical movement primitives based onnon-linear oscillators. Our ultimate goal is to establish a design principle of a controller in order to achieve natural human-likelocomotion. We suggest dynamical movement primitives as a central pattern generator (CPG) of a biped robot, an approach wehave previously proposed for learning and encoding complex human movements. Demonstrated trajectories are learned throughmovement primitives by locally weighted regression, and the frequency of the learned trajectories is adjusted automatically bya novel frequency adaptation algorithm based on phase resetting and entrainment of coupled oscillators. Numerical simulationsand experimental implementation on a physical robot demonstrate the effectiveness of the proposed locomotion controller.© 2004 Elsevier B.V. All rights reserved.

Keywords: Biped locomotion; Learning from demonstration; Dynamical movement primitives; Phase resetting; Frequency adaptation

1. Introduction

There has been a growing interest in biped loco-motion with the recent development of advanced hu-manoid robots. Many of existing successful walkingalgorithms use the zero moment point (ZMP) crite-rion [1] for off-line motion planning[2,3] and on-linebalance compensation[4–6]. These ZMP-based meth-ods have been shown to be effective to achieve bipedlocomotion in legged robots with flat feet. However,they require precise modeling of the robot dynamicsand high-gain trajectory tracking control, and the gen-erated patterns result in a typical ‘bent-knee’ postureto avoid singularities of inverse kinematics. From the

∗ Corresponding author.E-mail address: [email protected] (J. Nakanishi).

viewpoint of energy efficiency, such walking patternsare not desirable since torque must be continuouslyapplied to the knee joint to maintain a bent-kneeposture. These previous ZMP approaches have pri-marily focused on the ability of executing plannedmovements at any instance by ensuring surface con-tact between the sole and the ground[7] rather thannatural human-like motion which exploits passivedynamics of the body[8].

In contrast to off-line trajectory planning, biologi-cally inspired control approaches based on central pat-tern generators (CPGs) with neural oscillators havebeen drawing much attention for rhythmic motion gen-eration. As a CPG, a neural oscillator proposed byMatsuoka[9] is widely used, which models the fir-ing rate of two mutually inhibiting neurons describedin a set of differential equations. This model is used

0921-8890/$ – see front matter © 2004 Elsevier B.V. All rights reserved.doi:10.1016/j.robot.2004.03.003

Page 2: Learning from demonstration and adaptation of biped …cga/papers/nakanishi-ras04.pdfpatterns can be easily modified by scaling the param-eters r0, ω(= 1/τ) and ym individually.

80 J. Nakanishi et al. / Robotics and Autonomous Systems 47 (2004) 79–91

in robotic applications to achieve designated tasks in-volving rhythmic motion which requires interactionsbetween the system and the environment. Examplesinclude biped locomotion[10–13], quadruped locomo-tion [14], juggling [15], drumming[16], and playingwith a slinky toy [17]. Neural oscillators have desir-able properties such as adaptation to the environmentthrough entrainment. However, it is difficult to designinterconnection and feedback pathways of neural os-cillators, and to manually tune all open parameters inorder to achieve the desired behavior.

Other approaches include[18] in an effort to de-sign a simple controller based on physical intuition.The control strategy proposed in Ref.[18] is quitesimple and easy to implement. However, it requiresmanual tuning of the control parameters and accuratetorque-controlled actuators.

In this paper, we suggest an approach to learningbiped locomotion from demonstration and its adapta-tion through coupling between the pattern generatorand the mechanical system. Motivated by human’scapability of learning and imitating demonstratedmovements of a teacher, imitation learning has beenexplored as an efficient method for motor learning inrobots to accomplish desired movements[19–21]. Inour previous work, we proposed dynamical movementprimitives to encode complex discrete and rhythmicmulti-joint movements through imitation learning[22]. Dynamical movement primitives are formulatedas a set of autonomous non-linear differential equa-tions with well-defined attractor dynamics. Demon-strated trajectories are learned using locally weightedregression, and the output of dynamical movementprimitives serves as kinematic movement plans, e.g.,desired trajectories, for a robot.

This paper presents the idea of using the rhythmicmovement primitives based on phase oscillators[22]as a CPG to learn biped locomotion from demon-stration. Compared with neural oscillators, one of theappealing properties of phase oscillators is that thedesired phase relationship among oscillators can bespecified in a straightforward manner. In Ref.[23], acomprehensive formulation of phase coordination ofcoupled phase oscillators is proposed. Applicationsof coupled phase oscillators have been explored inthe gait control of multi-legged robots[24,25] andthe control of a biped robot[26]. In addition to usingphase oscillators, our movement primitive has various

desirable properties which are beneficial for bipedlocomotion. For example, it can learn a demonstratedtrajectory rapidly, and it is easy to re-scale the learnedrhythmic movement in terms of amplitude, frequencyand offset of the patterns[22]. In the application ofrhythmic movement primitives to biped locomotion,we introduce coupling terms to the movement primi-tives to achieve the desired phase relationship amonglimbs following the formulation proposed in Ref.[23]. We also propose an adaptation algorithm for thefrequency of walking based on phase resetting[27]and entrainment between the phase oscillator andmechanical system using feedback from the environ-ment. Frequency adaptation of a CPG is beneficialwhen the desired frequency of the coupled system isnot exactly known in advance.

In Ref. [26], a similar idea of using coupled phaseoscillators as a pattern generator for biped locomotionwas proposed. In their method, desired joint trajecto-ries of the legs are generated from a nominal trajec-tory at the tip of each leg defined by a combination ofsimple prescribed functions of phase through inversekinematics[26]. In comparison to Ref.[26], we be-lieve that our method has the advantage of flexibilityin encoding complex movements by imitation learn-ing and the potential capability of improving learnedmovements through reinforcement learning[28]. Wedemonstrate the effectiveness of the proposed controlstrategy by numerical simulations and experimentalimplementation.

2. Biped robot

We use a planar 5-link biped robot developed in Ref.[29]. The height of the robot is 0.4 m and the weight isabout 2 kg. The length of each link of the leg is 0.2 m.The mass of the body is 1.0 kg, the thigh is 0.43 kg andthe shank is 0.05 kg. The motion of the robot is con-strained within the sagittal plane by a tether boom. Thehip joints are directly actuated by direct drive motors,and the knee joints are driven by direct drive motorsthrough a wire transmission mechanism with the re-duction ratio of 2.0. These transmission mechanismswith low reduction ratio provide high back drivabil-ity at the joints. Foot contact with the ground is de-tected by foot switches. The robot is an underactuatedsystem having rounded soles with no ankles. Thus, it

Page 3: Learning from demonstration and adaptation of biped …cga/papers/nakanishi-ras04.pdfpatterns can be easily modified by scaling the param-eters r0, ω(= 1/τ) and ym individually.

J. Nakanishi et al. / Robotics and Autonomous Systems 47 (2004) 79–91 81

is challenging to design a controller to achieve bipedlocomotion with this robot since no actuation can beapplied between the stance leg and the ground com-pared to many of the existing biped robots which haveflat feet with ankle joint actuation.

3. Dynamical movement primitives

In this section, we outline the rhythmic dynamicalmovement primitives originally proposed in Ref.[22],which we will use as a CPG for biped locomotion inthis paper.

3.1. Rhythmic dynamical movement primitives

Rhythmic dynamical movement primitives encodeperiodic behavioral patterns as an output of a set ofnon-linear dynamical systems composed of acanon-ical dynamical system with a phase oscillator and atransformation dynamical system with a non-linearfunction approximator.

Consider the following limit cycle oscillator char-acterized in terms of an amplituder and a phaseφ asa canonical dynamical system which generates basicrhythmic patterns:

τφ = 1, (1)

τr = −µ(r − r0), (2)

where τ is a temporal scaling factor,r0 determinesthe desired (relative) amplitude, andµ is a positiveconstant. Note that the phase dynamics(1) can bewritten as

φ = ω, (3)

whereω = 1/τ is the natural frequency. When thereare multiple oscillators, we will introduce couplingterms among the oscillators (seeSection 4.1). Thisrhythmic canonical system is designed to provide anamplitude signalv = [r cosφ, r sin φ]T and phasevariable mod(φ, 2π) to the following second-ordertransformation dynamical system (z, y), where the out-put y is used as the desired trajectory for the robot:

τz = αz(βz(ym − y) − z), (4)

τy = z + f(v, φ), (5)

where α and β are time constants,ym is an offsetof the output trajectory.f is a non-linear functionapproximator using local linear models[30] of theform

f(v, φ) =∑N

k=1 ΨkwTk v∑N

k=1 Ψk

, (6)

wherewk is the parameter vector of thekth local modelwhich will be determined by locally weighted learn-ing [30] from a demonstrated trajectoryydemo (seeSection 3.2). Each local model is weighted by a Gaus-sian kernel function

Ψk = exp(−hk( mod(φ, 2π) − ck)2), (7)

whereck is the center of thekth linear model, andhk

characterizes its width. A final prediction is calculatedby the weighted average of the predictions of the indi-vidual models. As demonstrated in Ref.[22], the am-plitude, frequency and offset of the learned rhythmicpatterns can be easily modified by scaling the param-etersr0, ω(= 1/τ) andym individually.

3.2. Imitation learning with dynamical movementprimitives

An important issue is how to learn the parameterswk in the non-linear function(6) to characterize theoutput of a dynamical movement primitive for a givendemonstrated trajectoryydemo. Given a sampled datapoint (ftarget, v) at t where

ftarget= τydemo− zdemo (8)

and

τzdemo= αz(βz(ym − ydemo) − zdemo),

the learning problem is formulated to find the param-eterswk in Eq. (6)using incremental locally weightedregression technique[30] in which wi is updated by

wt+1k = wt

k + ΨkPt+1k vek, (9)

where

Pt+1k = 1

λ

(Pt

k − PtkvvT Pt

k

(λ/Ψk) + vT Ptkv

),

ek = ftarget− wTk v

Page 4: Learning from demonstration and adaptation of biped …cga/papers/nakanishi-ras04.pdfpatterns can be easily modified by scaling the param-eters r0, ω(= 1/τ) and ym individually.

82 J. Nakanishi et al. / Robotics and Autonomous Systems 47 (2004) 79–91

and λ ∈ [0, 1] is a forgetting factor. We chose thislocally weighted regression framework as it can au-tomatically find the correct number of necessary ba-sis function, and can tune thehk parameters of eachGaussian kernel function(7) to achieve higher func-tion approximation accuracy. Moreover, it learns theparameterswk of every local modelk totally indepen-dent of all other local models, which minimizes inter-ference between local models and provides a meansto robustly classify a rhythmic pattern with the helpof the parameterswk [22].

4. Rhythmic dynamical movement primitives as aCPG

We use the rhythmic dynamical movement primi-tives introduced inSection 3.1as a CPG for bipedlocomotion.Fig. 1 illustrates the proposed control ar-chitecture in this paper. Each joint is equipped witha movement primitive which generates the desiredjoint trajectoryθdes. We define the index and the cor-responding name of the joint as Left hip (i = 1,L HIP), and Left knee (i = 2, L KNEE), Right hip(i = 3, R HIP), and Right knee (i = 4, R KNEE).An additional oscillator (φref) is allocated to providea reference phase signal to the limb oscillators, whichis adjusted by the ground contact information at theinstance of heel strike.Section 4.1introduces cou-pling to the oscillators of the movement primitivesto achieve the desired phase relationship between the

Fig. 1. Proposed control architecture for biped locomotion withdynamical movement primitives.

limbs.Section 4.2proposes a frequency adaptation al-gorithm of the learned periodic movements throughthe interaction among the coupled oscillators, robotand environment.

4.1. Inter- and intra-limb phase coordination

We introduce coupling among the oscillators to reg-ulate the desired phase relationship between the limbsof the robot. This kind of coupling is motivated froma biological point of view where it has been hypothe-sized that coupling among neural oscillators plays animportant role in coordinating the desired phase re-lationship of limb movements in locomotion and gaittransition[31].

Consider the following coupling terms for the os-cillator i:

φi = ωi + κ

N∑i=1

Cij sin(φj − φi), (10)

whereκ is a positive constant gain, andCij is an el-ement of then × n matrix C which characterizes thecoupling with other oscillators. This form of couplingappears in various studies of coupled oscillators andtheir application, e.g.,[23–26,32,33]. In this paper, weemploy the formulation in Ref.[23] to specify the de-sired phase relationship. In Ref.[23], C is defined tobe a symmetric matrix where the diagonal elementsareCii = 0 for all i, and off-diagonal elementsCij arechosen as follows:

• Cij = 1: oscillatorsi and j are designed to be inphase such thatφi − φj = 0 (mod 2π).

• Cij = −1: oscillatorsi andj are designed to be outof phase such thatφi − φj = π (mod 2π).

As noted in Ref.[23], an arbitrary phase differenceother than 0 orπ can be specified by introducing achange of coordinates, or equivalent to having an offsetin the coupling terms.

In this paper, we design the desired phase differenceamong the canonical oscillators such that the links ofeach leg move in phase (with zero phase difference),and the left and right legs move out of phase (withπ phase difference) by defining the phase of the os-cillator as φi = 0 at the instance of heel strike ofthe corresponding leg. More specifically, we requireφ1 − φ2 = 0, φ3 − φ4 = 0, φ1 − φ3 = π, and

Page 5: Learning from demonstration and adaptation of biped …cga/papers/nakanishi-ras04.pdfpatterns can be easily modified by scaling the param-eters r0, ω(= 1/τ) and ym individually.

J. Nakanishi et al. / Robotics and Autonomous Systems 47 (2004) 79–91 83

φ2−φ4 = π. Thus, the connection matrixC is chosento be

C =

0 1 −1 −1

1 0 −1 −1

−1 −1 0 1

−1 −1 1 0

. (11)

4.2. Frequency adaptation of locomotion

Section 4.1introduced internal coupling of the os-cillators to coordinate the phase difference among thelimbs of the robot. This section considers interac-tion between the environment and the CPG to achieveself-tuning of the natural frequency of the oscillatorsand synchronization of the CPG with the periodic be-havior of the robot.

4.2.1. Synchronization of coupled oscillators withfrequency adaptation

Before introducing the proposed frequency adapta-tion law for the biped robot CPG, let us consider thebehavior of the following dynamics of two coupledoscillators:

φ1 = ω1 + K1(φ2 − φ1), (12)

φ2 = ω2 + K2(φ1 − φ2), (13)

whereω1, ω2 > 0 are natural frequencies of the os-cillators, andK1, K2 are positive coupling constants.Then, the oscillators run with the phase differenceψ∗ = φ2 − φ1 = (ω2 − ω1)/(K1 + K2) at the coupledfrequencyω∗ = (K2ω1+K1ω2)/(K1+K2) when theyare entrained. Whenω1 = ω2, the phase differenceψ = φ2 −φ1 remains non-zero. However, ifω1 = ω2,then the phase difference of these oscillators will bezero. Thus, we introduce an update law of the natu-ral frequencyω1 to achieve synchronization of theseoscillators with zero phase difference:

φ1(t) = ω1(t) + K1(φ2(t) − φ1(t)), (14)

ω1(t) = K(ω2 − ω1(t)), (15)

φ2(t) = ω2 + K2(φ1(t) − φ2(t)), (16)

whereK is a positive constant. It is straightforwardto see thatω1 → ω2 as t → ∞. Thus, the phasedifference will be zero such thatψ = φ2 − φ1 → 0.

4.2.2. Frequency update law and phase resetting ofCPG

In this section, we introduce an adaptation algo-rithm of the CPG in order to adjust the frequency ofthe learned periodic motions by the robot through theinteraction among the CPG, robot and environment.As depicted inFig. 1, the proposed control systemcan be regarded as a coupling of the CPG and the me-chanical oscillator (robot) which is analogous to thecoupled oscillator system discussed inSection 4.2.1.

For this purpose, we first introduce a referenceoscillator (φref) which will be synchronized with thelocomotion of the robot through the adaptation mech-anism described below. This reference oscillator canbe considered as a phase estimator of locomotion bythe discrete heel strike information detected by footswitches. Then, additional coupling is introduced tothe limb oscillators withφref to achieve the desiredrelative phaseφ1 = φ2 = φref andφ3 = φ4 = φref+π.

Motivated by the synchronization mechanism of thecoupled oscillators inSection 4.2.1, we propose thefollowing phase resetting and frequency update law.They can be interpreted as a discretized version ofphase coupling(14) and frequency update(15) at theinstance of heel strike:

φref = ωnref + δ(t − theel strike)(φ

robotheel strike− φref), (17)

ωn+1ref = ωn

ref + K(ωnmeasured− ωn

ref), (18)

whereδ is the Dirac’s delta function,n is the numberof steps, andφrobot

heel strikeis the phase of the mechanicaloscillator (robot) at heel strike defined asφrobot

heel strike=0 at the heel strike of the left leg, andφrobot

heel strike =π at the heel strike of the right leg.ωn

measuredis themeasured frequency of locomotion defined by

ωnmeasured=

π

T nmeasured

, (19)

whereT nmeasuredis the stepping period of locomotion

(half period with respect to the oscillator). At the sametime, natural frequencies of all the limb oscillatorsωi

are updated at the instance of heel contact such thatωi = ωn+1

ref . Note that it is possible to directly intro-duce phase resetting to the limb oscillators as seen inRef. [26]. However, introduction of a reference oscil-lator allows phase estimation depending on multipleevents and multi-modal information. Moreover, con-tinuous phase coupling of the limb oscillators with the

Page 6: Learning from demonstration and adaptation of biped …cga/papers/nakanishi-ras04.pdfpatterns can be easily modified by scaling the param-eters r0, ω(= 1/τ) and ym individually.

84 J. Nakanishi et al. / Robotics and Autonomous Systems 47 (2004) 79–91

15 15.5 16 16.5 17 17.5 0.6

0.4

0.2

0

0.2

0.4

Time (sec)

L_H

IP (

rad)

L_HIPL_HIP desL heel strkeR heel strike

15 15.5 16 16.5 17 17.5 0.2

0

0.2

0.4

0.6

0.8

1

Time (sec)

L_K

NE

E (

rad)

L_KNEEL_KNEE desL heel strkeR heel strike

Fig. 2. Joint trajectories for the left leg and heel strike timing for four periods (eight steps) of walking (simulation).

reference oscillator having phase resetting alleviatesthe problem of discontinuity to the desired joint tra-jectories.

The phase resetting algorithm(17) is motivatedfrom a biological perspective as well as a mathe-matical point of view. Phenomena of phase resettingor phase shift are observed in many biological os-cillators resulting from external perturbations, e.g.,circadian pacemakers, biochemical oscillators, andhuman finger tapping neural networks[27]. Phaseresetting is related to the stability properties of neu-ral rhythms, which can be analyzed by examiningthe phase-dependent responses against perturbations.A recent study[34] investigated the role of phaseresetting in biped locomotion. Numerical studies inRef. [34] suggest possible contribution of phase reset-ting during walking to gait stability against externalperturbations.

5. Numerical simulations

As a demonstrated trajectory, we use the mo-tion capture data of human walking in Ref.[35](29-year-old male, 173 cm, 83.5 kg, right hip andknee). We identified the period and frequency ofthis pattern by the power spectrum estimation withFFT and autocorrelation asT = 1.17 s andf =1/T = 0.855 Hz, respectively. The dynamics of therobot are derived using SD/FAST1 and integratedusing the Runge–Kutta algorithm at 1 ms step size.

1 http://www.sdfast.com

The ground contact force is calculated using a linearspring-damper model. A low-gain PD controller isused at each joint to track the desired trajectory whichis the output of the movement primitive.

A walking pattern from the demonstrated trajectoryis learned with the dynamical primitives. We manu-ally designed the desired trajectory for the initial stepof locomotion from a standing position at rest, andthe proposed CPG controller is activated at heel con-tact of the first step. The amplitude parameter of thedynamical primitives is set tor0 = 0.7, and the off-setym = 0.375 is introduced to the knee joints. Forthe scaling of the natural frequency of the oscillator,the adaptation law proposed inSection 4.2.2is usedwith the initial frequency ofω = 4.83 rad/s (period ofoscillation is 1.3 s). These parameters are determinedempirically from trial and error.

Fig. 2 illustrates the desired and actual joint trajec-tories for the left leg, and the timing of heel strikeafter a stable pattern was learned by the phase reset-ting algorithm.Fig. 3 shows the torque command forthe left leg, which indicates that the knee joint swingspassively since it requires almost no torque (seet =15.1–15.3 s).Fig. 4depicts one step of walking.Fig. 5(left) shows the adaptation of the period of locomotionandFig. 5 (right) shows the learning curve of the fre-quency of the CPG with different coupling constantsK = 0.2, 0.5 and 0.8 inEq. (18). The stepping periodapproached 0.387 s, and the resultant CPG frequencywasω = 8.12 rad/s, which roughly corresponds to thenatural frequency of the swing leg modeled as a sim-plified linear pendulum, using the proposed adaptationlaw.

Page 7: Learning from demonstration and adaptation of biped …cga/papers/nakanishi-ras04.pdfpatterns can be easily modified by scaling the param-eters r0, ω(= 1/τ) and ym individually.

J. Nakanishi et al. / Robotics and Autonomous Systems 47 (2004) 79–91 85

15 15.5 16 16.5 17 17.5 1

0.5

0

0.5

1

Time (sec)

L_H

IP to

rque

(N

m) L_HIP torque

L heel strkeR heel strike

15 15.5 16 16.5 17 17.5 1

0.5

0

0.5

1

Time (sec)

L_K

NE

E to

rque

(N

m)

L_KNEE torqueL heel strkeR heel strike

Fig. 3. Torque command to the left hip and knee joints for four periods (eight steps) of walking (simulation).

Fig. 4. Snapshots of walking simulation for one step at 15 frames/s (1 frame≈ 66 ms).

Robustness against external perturbations is evalu-ated by pushing the robot forward and backward withexternal forces. Forces are applied for a duration of0.1 s at different timing during a single step (at an in-terval of 0.1 rad from 0 to 2π of the phase of the ref-erence oscillator). When a forward perturbing force isapplied, the robot could cope with up to 9.1 N (max) atφ = 1.1 rad and 2.2 N (min) atφ = 2.7 rad of the per-turbing forces. When a backward perturbing force isapplied, the robot could cope with up to−2.4 N (max)

0 50 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Number of steps

Hal

f per

iod

(sec

)

K=0.2K=0.5K=0.8

0 50 1000

1

2

3

4

5

6

7

8

9

10

Number of steps

omeg

a (r

ad/s

)

K=0.2K=0.5K=0.8

Fig. 5. Frequency adaptation of walking via entrainment (simulation). Left: adaptation of period. Right: learning curve of the frequencyof the CPG.

at φ = 4.9 rad and−1.0 N (min) atφ = 0.4 andφ =0.5 rad of the perturbing forces. In contrast, withoutphase resetting, the robot only could cope with muchsmaller disturbances, for example, the robot only tol-erated up to 3.9 N of the forward perturbing force ap-plied atφ = 1.1.

The simulation results demonstrate self-adaptationof the frequency of locomotion and robustness ofwalking against disturbance by the proposed algo-rithm.

Page 8: Learning from demonstration and adaptation of biped …cga/papers/nakanishi-ras04.pdfpatterns can be easily modified by scaling the param-eters r0, ω(= 1/τ) and ym individually.

86 J. Nakanishi et al. / Robotics and Autonomous Systems 47 (2004) 79–91

16 16.5 17 17.5 18 18.5 1

0.5

0

0.5

Time (sec)

L_H

IP (

rad)

L_HIPL_HIP desL heelstrkeR heelstrike

16 16.5 17 17.5 18 18.5 0.5

0

0.5

1

Time (sec)

L_K

NE

E (

rad)

L_KNEEL_KNEE desL heelstrkeR heelstrike

Fig. 6. Joint trajectories for the left leg and heel strike timing for four periods (eight steps) of walking (experiment).

6. Experimental implementation

We implemented the proposed control frameworkon our biped robot. In the experimental implementa-tion, our initial attempt to achieve biped locomotionusing the human demonstrated trajectory was notsuccessful largely due to mechanical limitation of theexperimental system and discrepancy in the groundcontact condition between simulations and experi-ments. Thus, we used another target trajectory whichwas experimentally obtained from an actual trajectoryof successful robot locomotion using a state machinecontroller. The state machine controller is designed tocoordinate the leg movements with the physical stateof the legged system based on the idea presented inRef. [36].

16 16.5 17 17.5 18 18.5 2

1

0

1

2

Time (sec)

L_H

IP to

rque

(N

m) L_HIP torque

L heelstrkeR heelstrike

16 16.5 17 17.5 18 18.5 2

1

0

1

2

Time (sec)

L_K

NE

E to

rque

(N

m)

L_KNEE torqueL heelstrkeR heelstrike

Fig. 7. Torque command to the left hip and knee joints for four periods (eight steps) of walking (experiment).

To initiate locomotion in the experiments, we firstsuspended the robot with the legs swinging in theair, and then placed the robot on the ground man-ually. Thus, the initial condition of each run wasnot consistent, and occasionally the robot could notstart walking or fell over after a couple of stepswhen the timing was not appropriate.Fig. 6 illus-trates the desired and actual joint trajectories for theleft leg, and the timing of heel strike.Fig. 7 showsthe torque command for the left leg. Some oscilla-tion in the torque command for the hip joint canbe seen. This is due to noisy joint velocity signalsobtained from numerically differentiated joint an-gles measured by optical encoders. We are currentlyplanning to use gyros to obtain smoother joint ve-locities to improve the performance of the trackingcontroller. Note that a limit on the torque command

Page 9: Learning from demonstration and adaptation of biped …cga/papers/nakanishi-ras04.pdfpatterns can be easily modified by scaling the param-eters r0, ω(= 1/τ) and ym individually.

J. Nakanishi et al. / Robotics and Autonomous Systems 47 (2004) 79–91 87

0 50 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Number of steps

Hal

f per

iod

(sec

)

0 50 1000

1

2

3

4

5

6

7

8

9

10

Number of steps

omeg

a (r

ad/s

)

Fig. 8. Frequency adaptation of walking via entrainment (experiment). Left: adaptation of period. Right: learning curve of the frequencyof the CPG.

is imposed at±1.5 Nm. Fig. 8 (left) shows the pe-riod adaptation andFig. 8 (right) shows the learningcurve of the frequency of the CPG. Stepping pe-riod for a typical walking experiment was around0.37 s. In this experiment, the initial frequency of theoscillator was set toω = 5.71 rad/s (period of oscil-lation is 1.1 s), and the adaptation gain inEq. (18)was decreased according to an annealing procedureK = K0/n, where K0 = 0.05 andn is the num-ber of steps, as is needed in most gradient descentprocedure. We introduced an offsetα for phaseresetting

φref = ωn + δ(t − theel strike)(φrobotheel strike− φref + α)

(20)

to adjust the timing of foot contact, whereα is chosento be α = 0.8 rad. These parameters are determinedempirically. Note that phase resetting with an offseteffectively changes the period of oscillation.

Fig. 9. Walking over surfaces with different friction properties and a seesaw-like metal sheet with a slight change in the slope.

Robustness of the proposed algorithm is evaluatedby testing walking over surfaces with different fric-tion properties such as carpet, cork sheet (3 mm thick)and a seesaw-like metal plate (2 mm thick). The metalplate was placed so that the inclination of the slopeslightly changes like a seesaw when the robot walksover it (the height of the center is 7 mm). The robotcould deal with the change in the environment asdepicted inFig. 9.

Note that even if we use the learned trajectory fromthe actual robot walking pattern, the robot could notwalk by just replaying it as a desired trajectory. Phaseresetting using foot contact information was necessary.This implies that appropriate on-line adjustment of thephase of the CPG by sensory feedback from the envi-ronment is essential to achieve successful locomotion.In addition, empirically we found that the proposedcontroller achieved much more robust walking com-pared to the state machine-based controller which weoriginally designed.

Page 10: Learning from demonstration and adaptation of biped …cga/papers/nakanishi-ras04.pdfpatterns can be easily modified by scaling the param-eters r0, ω(= 1/τ) and ym individually.

88 J. Nakanishi et al. / Robotics and Autonomous Systems 47 (2004) 79–91

7. Conclusion

In this paper, we proposed a method for learn-ing biped locomotion from demonstration and itsfrequency adaptation using dynamical movementprimitives. In the dynamical movement primitives,kinematic movement plans are described in a set ofnon-linear differential equations with well-definedattractor dynamics, and demonstrated trajectories arelearned using locally weighted regression. Specifi-cally, we use rhythmic dynamical movement primi-tives based on coupled phase oscillators as a CPG, andintroduced a frequency adaptation algorithm throughinteractions among the CPG, mechanical system andthe environment motivated by the synchronization ofcoupled oscillators. Numerical simulations and ex-perimental result demonstrate the effectiveness of theproposed control algorithm to achieve steady-statewalking roughly at the natural frequency of the cou-pled system. We also evaluated robustness againstdisturbance in numerical simulations and experiments.

Future work will address initiation and terminationof walking, and on-line balance compensation. Wewill also consider collection of human’s walking dataunder various behavioral conditions. In our currentstudy, we used a simple phase resetting mechanismin which the phase of the CPG is forced to be reset toa specific value at the instance of heel strike regard-less of the current phase of the CPG. In the future,we are interested in the generalization of the idea ofphase resetting to determine phase-dependent reactionagainst external perturbations such as recovery fromstumbling by designing an appropriate phase resettingcurve[27]. Formal mathematical analysis will be re-quired to understand the principle of periodic stabilityof a limit cycle solution to the dynamics of a combinedoscillator and mechanical system. In the long run, weare hopeful that our approach may provide insight intoa theoretically sound design principle of biped loco-motion control to achieve human-like natural walking.

Acknowledgements

We would like to thank Auke Ijspeert at EPFL,Swiss Federal Institute of Technology, Lausanne andSeiichi Miyakoshi of the Digital Human ResearchCenter, AIST, Japan, and Chris Atkeson at Carnegie

Mellon University, for valuable discussions and help-ful comments.

This research was supported in part by National Sci-ence Foundation grants ECS-0325383, IIS-0312802,IIS-0082995, ECS-0326095, ANI-0224419, a NASAgrant AC#98-516, an AFOSR grant on IntelligentControl, the Communications Research Laboratory ofJapan, the ERATO Kawato Dynamic Brain Project,funded by the Japan Science and Technology Agency,and the ATR Computational Neuroscience Laborato-ries.

References

[1] M. Vukobratovic, B. Borovac, D. Surla, D. Stokic, BipedLocomotion—Dynamics, Stability, Control and Application,Springer, 1990.

[2] S. Kagami, T. Kitagawa, K. Nishiwaki, T. Sugihara, M. Inaba,H. Inoue, A fast dynamically equilibrated walking trajectorygeneration method of humanoid robot, Autonomous Robots12 (2002) 71–82.

[3] A. Takanishi, M. Tochizawa, H. Karaki, I. Kato, Dynamicbiped walking stabilized with optimal trunk and waist motion,in: Proceedings of the IEEE/RSJ International Workshop onIntelligent Robots and Systems, 1989, pp. 561–566.

[4] K. Hirai, M. Hirose, Y. Haikawa, T. Takenaka, Thedevelopment of honda humanoid robot, in: Proceedings of theIEEE International Conference on Robotics and Automation,1998, pp. 1321–1326.

[5] S. Kagami, F. Kanehiro, Y. Tamiya, M. Inaba, H. Inoue,Autobalancer: an online dynamic balance compensationscheme for humanoid robots, in: B.R. Donald, K. Lynch, D.Rus (Eds.), Algorithmic and Computational Robotics: NewDirections, A K Peters Ltd., 2001, pp. 329–340.

[6] J. Yamaguchi, A. Takanishi, I. Kato, Development of abiped walking robot compensating for three-axis moment bytrunk motion, in: Proceedings of the IEEE/RSJ InternationalConference on Intelligent Robots and Systems, 1993,pp. 187–192.

[7] K. Yoneda, S. Hirose, Tumble stability criterion of integratedlocomotion and manipulation, in: Proceedings of theIEEE/RSJ International Conference on Intelligent Robots andSystems, 1996, pp. 870–876.

[8] S. Mochon, T.A. McMahon, Ballistic walking, Journal ofBiomechanics 13 (1980) 49–57.

[9] K. Matsuoka, Sustained oscillations generated by mutuallyinhibiting neurons with adaptation, Biological Cybernetics 52(1985) 367–376.

[10] G. Taga, Y. Yamaguchi, H. Shimizu, Self-organized controlof bipedal locomotion by neural oscillators in unpredictableenvironment, Biological Cybernetics 65 (1991) 147–159.

[11] S. Miyakoshi, G. Taga, Y. Kuniyoshi, A. Nagakubo,Three dimensional bipedal stepping motion using neuraloscillators—towards humanoid motion in the real world, in:

Page 11: Learning from demonstration and adaptation of biped …cga/papers/nakanishi-ras04.pdfpatterns can be easily modified by scaling the param-eters r0, ω(= 1/τ) and ym individually.

J. Nakanishi et al. / Robotics and Autonomous Systems 47 (2004) 79–91 89

Proceedings of the IEEE/RSJ International Conference onIntelligent Robots and Systems, 1998, pp. 84–89.

[12] K. Hase, N. Yamazaki, Computational evolution of humanbipedal walking by a neuro-musculo-skeletal model, ArtificialLife and Robotics 3 (1999) 133–138.

[13] G. Endo, J. Morimoto, J. Nakanishi, G. Cheng, An empiricalexploration of a neural oscillator for biped locomotion, in:Proceedings of the IEEE International Conference on Roboticsand Automation, 2004, pp. 3036–3042.

[14] Y. Fukuoka, H. Kimura, A.H. Cohen, Adaptive dynamicwalking of a quadruped robot on irregular terrain basedon biological concepts, International Journal of RoboticsResearch 22 (3–4) (2003) 187–202.

[15] S. Miyakoshi, M. Yamakita, K. Furuta, Juggling controlusing neural oscillators, in: Proceedings of the IEEE/RSJInternational Conference on Intelligent Robots and Systems,1994, pp. 1186–1193.

[16] S. Kotosaka, S. Schaal, Synchronized robot drummingby neural oscillator, in: Proceedings of the InternationalSymposium on Adaptive Motion of Animals and Machines,2000.

[17] M.M. Williamson, Neural control of rhythmic armmovements, Neural Networks 11 (1998) 1379–1394.

[18] J. Pratt, G. Pratt, Intuitive control of a planar bipedal walkingrobot, in: Proceedings of the IEEE International Conferenceon Robotics and Automation, 1998, pp. 2014–2021.

[19] H. Miyamoto, S. Schaal, F. Gandolfo, Y. Koike, R. Osu, E.Nakano, Y. Wada, M. Kawato, A kendama learning robotbased on bi-directional theory, Neural Networks 9 (1996)1281–1302.

[20] H. Miyamoto, M. Kawato, A tennis serve and upswinglearning robot based on bi-directional theory, NeuralNetworks 11 (1998) 1317–1329.

[21] S. Schaal, Is imitation learning the route to humanoid robots?Trends in Cognitive Sciences 3 (6) (1999) 233–242.

[22] A. Ijspeert, J. Nakanishi, S. Schaal, Learning attractorlandscapes for learning motor primitives, in: S. Becker,S. Thrun, K. Obermayer (Eds.), Advances in NeuralInformation Processing Systems, vol. 15, MIT-Press, 2003,pp. 1547–1554.

[23] E. Klavins, D.E. Koditschek, Phase regulation of decentralizedcyclic robotic systems, International Journal of RoboticsResearch 21 (3) (2002) 257–275.

[24] S. Ito, H. Yuasa, Z. wei Luo, M. Ito, D. Yanagihara,A mathematical model of adaptation in rhythmic motionto environmental changes, in: Proceedings of the IEEEInternational Conference on Systems, Man and Cybernetics,1997, pp. 275–280.

[25] K. Tsujita, K. Tsuchiya, A. Onat, Adaptive gait pattern controlof a quadruped locomotion robot, in: Proceedings of theIEEE/RSJ International Conference on Intelligent Robots andSystems, 2001, pp. 2318–2325.

[26] K. Tsuchiya, S. Aoi, K. Tsujita, Locomotion control ofa biped locomotion robot using nonlinear oscillators, in:Proceedings of the IEEE/RSJ International Conference onIntelligent Robots and Systems, 2003, pp. 1745–1750.

[27] M. Kawato, Transient and steady state phase response curvesof limit cycle oscillators, Journal of Mathematical Biology12 (1981) 13–30.

[28] J. Peters, S. Vijayakumar, S. Schaal, Reinforcement learningfor humanoid robotics, in: Proceedings of the ThirdIEEE-RAS International Conference on Humanoid Robots,2003.

[29] J. Morimoto, G. Zeglin, C.G. Atkeson, Minimax differentialdynamic programming: application to a biped walking robot,in: Proceedings of the IEEE/RSJ International Conference onIntelligent Robots and Systems, 2003, pp. 1927–1932.

[30] S. Schaal, C.G. Atkeson, Constructive incremental learningfrom only local information, Neural Computation 10 (8)(1998) 2047–2084.

[31] A.J. Ijspeert, A connectionist central pattern generator forthe aquatic and terrestrial gaits of a simulated salamander,Biological Cybernetics 84 (5) (2001) 331–348.

[32] Y. Kuramoto, Chemical Oscillations, Waves, and Turbulence,Springer-Verlag, 1984.

[33] K. Sekiyama, J. Nakanishi, I. Takagawa, T. Higashi, T.Fukuda, Self-organizing control of urban traffic signalnetwork, in: Proceedings of the IEEE InternationalConference on Systems, Man and Cybernetics, 2001,pp. 2481–2486.

[34] T. Yamasaki, T. Nomura, S. Sato, Possible functional roles ofphase resetting during walking, Biological Cybernetics 88 (6)(2003) 468–496.

[35] Y. Ehara, S. Yamamoto, Introduction to Body-Dynamics—Analysis of Gait and Gait Initiation, Ishiyaku Publishers, 2002(in Japanese).

[36] J.K. Hodgins, Biped gait transitions, in: Proceedings of theIEEE International Conference on Robotics and Automation,1991, pp. 2092–2097.

Jun Nakanishi received the B.E. andM.E. degrees both in mechanical engi-neering from Nagoya University, Nagoya,Japan, in 1995 and 1997, respectively. Hereceived the Ph.D. degree in engineer-ing from Nagoya University in 2000. Healso studied in the Department of Elec-trical Engineering and Computer Scienceat the University of Michigan, Ann Ar-bor, USA, from 1995 to 1996. He was

a Research Associate at the Department of Micro System Engi-neering, Nagoya University, from 2000 to 2001, and was a presi-dential postdoctoral fellow at the Computer Science Department,the University of Southern California, Los Angeles, USA, from2001 to 2002. He joined ATR Human Information Science Lab-oratories, Kyoto, Japan, in 2002. He is currently a researcher atATR Computational Neuroscience Laboratories and with the Com-putational Brain Project, ICORP, Japan Science and TechnologyAgency. His research interests include motor learning and controlin robotic systems. He received the IEEE ICRA 2002 Best PaperAward.

Page 12: Learning from demonstration and adaptation of biped …cga/papers/nakanishi-ras04.pdfpatterns can be easily modified by scaling the param-eters r0, ω(= 1/τ) and ym individually.

90 J. Nakanishi et al. / Robotics and Autonomous Systems 47 (2004) 79–91

Jun Morimoto received his B.E. incomputer-controlled mechanical systemsfrom Osaka University, Osaka, Japan, in1996, M.E. in information science fromNara Institute of Science and Technol-ogy, Nara, Japan, in 1998, and Ph.D. ininformation science from Nara Instituteof Science and Technology, Nara, Japan,in 2001. He was a Research Assistant atKawato Dynamic Brain Project, ERATO,

JST, from 1999 to 2001. He was a postdoctoral fellow at theRobotics Institute, Carnegie Mellon University, Pittsburgh, PA,from 2001 to 2002. He is currently a researcher at ATR Com-putational Neuroscience Laboratories, Kyoto, Japan, and with theComputational Brain Project, ICORP, Japan Science and Technol-ogy Agency. He is a member of Japanese Neural Network Soci-ety, and Robotics Society of Japan. His research interests includereinforcement learning and robotics.

Gen Endo received his B.E. and M.E.degrees both in Mechano-Aerospace En-gineering from Tokyo Institute of Tech-nology, Tokyo, Japan, in 1996 and 1998,respectively. He received the Ph.D. de-gree in engineering from Tokyo Instituteof Technology in 2000. He joined SonyCorporation, Tokyo, Japan, in 2000. Heis currently a collaborative researcher atATR Computational Neuroscience Labo-

ratories, Kyoto, Japan. He is a member of Robotics Society ofJapan. He received the Best Paper Award of Robotics Society ofJapan in 2002. His research interests include biologically inspiredmobile robot, mechanical design and control.

Gordon Cheng is the head of the Depart-ment of Humanoid Robotics and Com-putational Neuroscience, ATR Computa-tional Neuroscience Laboratories, Kyoto,Japan, and with the Computational BrainProject, ICORP, Japan Science and Tech-nology Agency. Before taking up thisposition, he held fellowships from theCenter of Excellence (COE), Science, andTechnology Agency (STA) of Japan. Both

of these fellowships were taken at the Humanoid Interaction Labo-ratory, Intelligent Systems Division at the ElectroTechnical Labo-ratory (ETL), Japan. At ETL he played a major role in developinga completely integrated humanoid robotics system. He receiveda Ph.D. in systems engineering from the Department of SystemsEngineering, The Australian National University, and Bachelorand Master degrees in Computer Science from the University ofWollongong, Australia. His industrial experience includes consul-tancy to and as a national systems manager for a major transport

company. He was also the director of the company, G.T.I. Com-puting, specializing in networking/transport management systemsin Australia. His research interests include humanoid robotics,biomimetic of human vision, computational neuroscience of vi-sion, action understanding, human-robot interaction, active vision,mobile robot navigation and object-oriented software construction.He is a society member of the IEEE Robotics & Automation andComputer Society. He is on the editorial board of the InternationalJournal of Humanoid Robotics. He is a program co-chair for thenext IEEE International Conference on Humanoid Robots.

Stefan Schaal is an Associate Professorat the Department of Computer Scienceand the Neuroscience Program at theUniversity of Southern California, and anInvited Researcher at the ATR Computa-tional Neuroscience Laboratory in Japan,where he held an appointment as Headof the Computational Learning Groupduring an international ERATO project,the Kawato Dynamic Brain Project (ER-

ATO/JST). He is also an Adjunct Assistant Professor at theDepartment of Kinesiology of the Pennsylvania State University.Before joining USC, Dr. Schaal was a postdoctoral fellow at theDepartment of Brain and Cognitive Sciences and the ArtificialIntelligence Laboratory at MIT, an Invited Researcher at theATR Human Information Processing Research Laboratories inJapan, and an Adjunct Assistant Professor at the Georgia Instituteof Technology. Dr. Schaal’s research interests include topics ofstatistical and machine learning, neural networks, computationalneuroscience, non-linear dynamics, non-linear control theory, andbiomimetic robotics. He applies his research to problems of ar-tificial and biological motor control and motor learning, focusingon both theoretical investigations and experiments with humansubjects and anthropomorphic robot equipment.

Mitsuo Kawato received the B.S. de-gree in physics from Tokyo University in1976, the M.E. and Ph.D. degrees in bio-physical engineering from Osaka Univer-sity in 1978 and 1981, respectively. From1981 to 1988 he was a faculty memberand lecturer of Osaka University. From1992 he became a department head ofDepartment 3, ATR Human InformationProcessing Research Laboratories. From

2003, Director of ATR Computational Neuroscience Laboratories.From 2004, he has been jointly appointed as the Director of theComputational Brain Project, ICORP, JST. From 1996 to 2001he was jointly appointed as a director of Kawato Dynamic BrainProject, ERATO, JST. He has been jointly appointed as visitingprofessor of Kanazawa Institute of Technology, Nara Institute ofScience and Technology and Osaka University. For the last 15

Page 13: Learning from demonstration and adaptation of biped …cga/papers/nakanishi-ras04.pdfpatterns can be easily modified by scaling the param-eters r0, ω(= 1/τ) and ym individually.

J. Nakanishi et al. / Robotics and Autonomous Systems 47 (2004) 79–91 91

years he has been working in computational neuroscience. Hewas awarded Yonezawa founder’s medal memorial special awardof the Institute of Electronics, Information and CommunicationEngineers in 1991, outstanding research award of the InternationalNeural Network Society in 1992, Osaka Science Prize in 1993,

10th Tsukahara Naka-akira Memorial Award in 1996 and TokizaneToshihiko memorial award. He is a governing board memberof Japanese Society of Neuroscience and Japan Neural NetworkSociety and Member of Executive Committee of InternationalAssociation for the Study of Attention and Performance.