Research Article Robot Obstacle Avoidance Learning Based ...Research Article Robot Obstacle...

15
Research Article Robot Obstacle Avoidance Learning Based on Mixture Models Huiwen Zhang, 1,2 Xiaoning Han, 1,2 Mingliang Fu, 1,2 and Weijia Zhou 1 1 State Key Laboratory of Robotics, Shenyang Institute of Automation, Shenyang 110016, China 2 University of Chinese Academy of Sciences, Beijing, China Correspondence should be addressed to Huiwen Zhang; [email protected] Received 27 May 2016; Accepted 21 July 2016 Academic Editor: Keigo Watanabe Copyright © 2016 Huiwen Zhang et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. We briefly surveyed the existing obstacle avoidance algorithms; then a new obstacle avoidance learning framework based on learning from demonstration (LfD) is proposed. e main idea is to imitate the obstacle avoidance mechanism of human beings, in which humans learn to make a decision based on the sensor information obtained by interacting with environment. Firstly, we endow robots with obstacle avoidance experience by teaching them to avoid obstacles in different situations. In this process, a lot of data are collected as a training set; then, to encode the training set data, which is equivalent to extracting the constraints of the task, Gaussian mixture model (GMM) is used. Secondly, a smooth obstacle-free path is generated by Gaussian mixture regression (GMR). irdly, a metric of imitation performance is constructed to derive a proper control policy. e proposed framework shows excellent generalization performance, which means that the robots can fulfill obstacle avoidance task efficiently in a dynamic environment. More importantly, the framework allows learning a wide variety of skills, such as grasp and manipulation work, which makes it possible to build a robot with versatile functions. Finally, simulation experiments are conducted on a Turtlebot robot to verify the validity of our algorithms. 1. Introduction With the development of robot technology, especially the development of computer science and cognitive science, some promising technologies come out, such as deep learning and computer vision. ese technologies greatly improve the intelligence level of robots and also laid the foundation for the modern intelligent robots. Unlike the traditional mature industrial robots, intelligent robots are expected to coexist with people, such as service robots. To achieve this purpose, robots are required to have strong environmental adaptability and autonomous decision-making ability. Obstacle avoidance and planning are basic skills of robots, which are also research focus in the field of robotics. ere have been a lot of mature algorithms for obstacle avoidance, but how to develop obstacle avoidance algorithm with strong robustness, especially algorithms that can make robots fulfill task under unstructured environment, is still worthy of further research. e obstacle avoidance algorithms can be classified into traditional obstacle avoidance algorithms and intelli- gent algorithms based on the developments history. In [1] the authors concluded them as classical approaches and heuristic approaches. In [2] the authors called them classical algorithms and evolutionary algorithms. Here the traditional approaches cover the same content, while the intelligent algorithms combine some machine learning methods applied to robot obstacle avoidance in recent years, which is differ- ent from the terms coined by other authors and covers a wider range. Classical obstacle avoidance algorithms can be categorized into three classes based on their main idea: (i) methods based on geometric model, (ii) methods based on virtual field, and (iii) methods based on mathematical pro- gramming. Details of classic obstacle avoidance algorithms can be found in [3]. Methods based on geometric models can be divided into two steps. First build the geometric model of the environment, and then choose the right kind of search algorithms. e main idea of virtual field methods is to assume that all robots stay in a virtual field, the change of virtual field reflects the feasible solution for the path. is concept is first put forward and used in [4, 5]. Mathematical programming methods represent constraints for obstacle as some parameterized inequities in configuration space, so Hindawi Publishing Corporation Journal of Robotics Volume 2016, Article ID 7840580, 14 pages http://dx.doi.org/10.1155/2016/7840580

Transcript of Research Article Robot Obstacle Avoidance Learning Based ...Research Article Robot Obstacle...

Page 1: Research Article Robot Obstacle Avoidance Learning Based ...Research Article Robot Obstacle Avoidance Learning Based on Mixture Models HuiwenZhang, 1,2 XiaoningHan, 1,2 MingliangFu,

Research ArticleRobot Obstacle Avoidance Learning Based on Mixture Models

Huiwen Zhang12 Xiaoning Han12 Mingliang Fu12 and Weijia Zhou1

1State Key Laboratory of Robotics Shenyang Institute of Automation Shenyang 110016 China2University of Chinese Academy of Sciences Beijing China

Correspondence should be addressed to Huiwen Zhang zhanghuiwensiacn

Received 27 May 2016 Accepted 21 July 2016

Academic Editor Keigo Watanabe

Copyright copy 2016 Huiwen Zhang et alThis is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

We briefly surveyed the existing obstacle avoidance algorithms then a new obstacle avoidance learning framework based onlearning from demonstration (LfD) is proposed The main idea is to imitate the obstacle avoidance mechanism of human beingsin which humans learn to make a decision based on the sensor information obtained by interacting with environment Firstly weendow robots with obstacle avoidance experience by teaching them to avoid obstacles in different situations In this process a lotof data are collected as a training set then to encode the training set data which is equivalent to extracting the constraints of thetask Gaussian mixture model (GMM) is used Secondly a smooth obstacle-free path is generated by Gaussian mixture regression(GMR) Thirdly a metric of imitation performance is constructed to derive a proper control policy The proposed frameworkshows excellent generalization performance whichmeans that the robots can fulfill obstacle avoidance task efficiently in a dynamicenvironmentMore importantly the framework allows learning a wide variety of skills such as grasp andmanipulationwork whichmakes it possible to build a robot with versatile functions Finally simulation experiments are conducted on a Turtlebot robot toverify the validity of our algorithms

1 Introduction

With the development of robot technology especially thedevelopment of computer science and cognitive sciencesome promising technologies come out such as deep learningand computer vision These technologies greatly improve theintelligence level of robots and also laid the foundation forthe modern intelligent robots Unlike the traditional matureindustrial robots intelligent robots are expected to coexistwith people such as service robots To achieve this purposerobots are required to have strong environmental adaptabilityand autonomous decision-making ability Obstacle avoidanceand planning are basic skills of robots which are alsoresearch focus in the field of robotics There have been alot of mature algorithms for obstacle avoidance but how todevelop obstacle avoidance algorithmwith strong robustnessespecially algorithms that can make robots fulfill task underunstructured environment is still worthy of further research

The obstacle avoidance algorithms can be classifiedinto traditional obstacle avoidance algorithms and intelli-gent algorithms based on the developments history In [1]

the authors concluded them as classical approaches andheuristic approaches In [2] the authors called them classicalalgorithms and evolutionary algorithms Here the traditionalapproaches cover the same content while the intelligentalgorithms combine somemachine learningmethods appliedto robot obstacle avoidance in recent years which is differ-ent from the terms coined by other authors and covers awider range Classical obstacle avoidance algorithms can becategorized into three classes based on their main idea (i)methods based on geometric model (ii) methods based onvirtual field and (iii) methods based on mathematical pro-gramming Details of classic obstacle avoidance algorithmscan be found in [3] Methods based on geometric models canbe divided into two steps First build the geometric model ofthe environment and then choose the right kind of searchalgorithms The main idea of virtual field methods is toassume that all robots stay in a virtual field the change ofvirtual field reflects the feasible solution for the path Thisconcept is first put forward and used in [4 5] Mathematicalprogramming methods represent constraints for obstacle assome parameterized inequities in configuration space so

Hindawi Publishing CorporationJournal of RoboticsVolume 2016 Article ID 7840580 14 pageshttpdxdoiorg10115520167840580

2 Journal of Robotics

a geometric problem is turned into a mathematical problemthe best obstacle-free path is a solution for the inequitieswhich can be solved by optimization methods All thesemethods have been successfully used but the traditionalmethod generally is only applicable to the static environmentand easy to fall into a local optimal solution To solve theseproblems more and more intelligent approaches are used By2003 more than half of articles use heuristic algorithms tosolve obstacle avoidance problem [1]

The core idea of heuristic algorithms is to simulate someintelligent behaviors of animals or people Algorithms aredesigned by bionic mechanism There are a large number ofrepresentative algorithms such as Ant Colony Optimization(ACO) algorithm Particle SwarmOptimization (PSO) Algo-rithm and Genetic Algorithm (GA) In [6] ACO is used asa global path planning method to avoid obtaining a localoptimal solution when using potential field method In [7]distance and memory feature are introduced to speed up thecomputation of the ACO The emergence of PSO derivedfrom the simulation of birdsmigrating behavior Applicationsof particle swarm optimization to solve the path planningproblem include [8ndash10] GA derived from the theory ofevolution and genetics is a kind of global probabilisticoptimization algorithm In [11] a special encoding technologyis used to speed up the path generation The fitness functionis designed according to security and distance relative to atarget and then a barrier-free path is solved through a geneticalgorithm In [12] a variable-length chromosome geneticalgorithm is designed for path planning In [13] the authorsuse artificial network to solve the obstacle avoidance problemof a redundant manipulator An improved Grossberg neuralnetwork method is used in [14] to solve the UAV pathplanning under a ldquodanger moderdquo Neural network can alsorealize the obstacle avoidance in conjunction with othermethods such as fuzzy neural network learning [15] Amongall the intelligent algorithms there are broad categoricalapproaches which are based on the learning A representativework is using reinforcement learning Reinforcement learn-ing derives from the psychology The main idea is throughconstant interaction with the environment and ldquotrial anderrorrdquo to maximize the reward defined by a reward function[16] Based on this concept a lot of obstacle avoidancealgorithms are proposed [17ndash22] It is worth noting thatwith a great improvement of hardware and the developmentof deep learning theory some methods can directly learnobstacle avoidance strategy from the sensor data called end-to-end learning [23] Intelligent algorithms usually can obtainthe global optimal solution and have good ability to adaptto the environment But often these methods bear highcomputational complexity and bad real-time performance

In this paper we proposed a kind of obstacle avoidancemethod based on teaching and learning First learning sam-ple is obtained by showing robots how to avoid the obstacleand thenusing probabilisticmixturemodel for extraction andencoding task constraints autonomously Finally obstacle-free path is generated by GMR Our approach differs fromprevious work in the following (i) it is more in line with thehabit of human cognition drawing lessons from imitationlearning and (ii) the algorithm has good generalization

ability and can adapt to dynamic unstructured environmentBesides the computational complexity is low and in par-ticular the path generation process can be realized onlineefficiently (iii) the learning framework not only can be usedto learn obstacle avoidance but also can be used for learningother skills which is a key point to achieve generalizedartificial intelligence (GAI)

Themain contributions of the paper are as follows (i) thefirst contribution is using LfD for the obstacle avoidance taskfor the first time making up for the disadvantages of othermethods (ii) due to the nature of LfD nontechnical peoplecan programme the robots to realize obstacle avoidance func-tion (iii) obstacle avoidance learning using mixture modelsis able to adapt the environmental changes and dynamicobstacles and at the same time shares low computationalcomplexity

The remainder of the paper is organized as followsIn Section 2 some background knowledge is introducedincluding the concepts of LfD and GMM Section 3 intro-duces parameters solving methods for GMM Section 4 givesa detailed introduction of the obstacle avoidance learningarchitecture In Section 5 simulation experiments based on aTurtlebot robot are conducted In the end conclusion ismadeand further improvements are introduced

2 Learning from Demonstration

LfD is a paradigm which allows robots to automaticallyacquire new skills Unlike the traditional robots to fulfill atask a previous task decomposition work and programmingfor each action need to be done by humans LfD adopts thepoint that robotrsquos skills can be acquired by observing thehuman operationThis approach has the advantages of simpleprogramming and even people who do not understandprogramming technique can programme robots throughteaching Besides due to the nature of LfD robots are moreadvantageous to adapt to new environment LfD is alsonamed robot Programming by Demonstration (PbD) (Imi-tation Learning and Apprenticeship Learning) Learning byDemonstration (LbD) Learning by Showing Learning byWatching Learning from Observation and so forth It isworth noting that some authors think that imitation learningand LfD have different meanings in [24] the authors distin-guish imitation learning and LfD according to how learningdata is recorded and how the imitator uses the data in thelearning process

21 Formulation of LfD Problem Solving process for LfD canbe divided into two steps First acquire training data seconduse the obtained data to derive a control policy We use 119878

and 119860 to represent state set and action set respectively Theprobability distribution function 119879(119904

1015840| 119904 119886) 119878 times 119860 times 119878 rarr

[0 1] represents the transfer probability from state (119904 119886)

to state 1199041015840 An experience dataset 119863 should include this

information Actually in most situations the true states 119878 areunknown the observable states are 119885 and then a learningproblem can be defined as deriving a policy 120587mapping form119885 to 119860 For an obstacle avoidance task the problem can

Journal of Robotics 3

Demonstration Policy derivation Robot systemDataset

Learning form demonstration Task execution

(S A T)

120587

Figure 1 Framework of learning form demonstration (LfD)

be described as finding the more general barrier-free path(policy) from the demonstrated barrier-free path (sample119863)Themain process for LfD framework is illustrated in Figure 1

22 Main Research Topics in LfD Nehaniv and Dautenhahn[25] summarized themain problems in LfD as four issues (1)What to imitate (2) How to imitate (3) When to imitate(4) Whom to imitate So far only the former two problemsare solved For the problem of robot obstacle avoidancewhat to imitate means the robot needs to extract the rightconstraints which can characterize the task and problem ofhow to imitate refers to generating a trajectory that satisfiesthe constraints obtained in the previous step

In order to solve the above problems (mainly the formertwo problems) there are three main ideas

(i) Directly get a map function 120587 = 119891(119911) from trainingset 119863 = (119911

119894 119886119894) where 119911 stands for the observed

state This idea treats the problem as supervisedlearning problem

(ii) Learn a system model namely the return functionand state transition probability distribution formtraining set and then derive a policy using MDPmethods

(iii) Learn the pre- or postcondition of each action andthen obtain control sequence using programmingmethods

For the obstacle avoidance problem through learning thebarrier-free path from teachers to produce a generalized pathcan be seen as a supervised regression problem GMM is aprobabilistic model which is suitable to model the naturalvariations in human and robot motionsTheoretically GMMcan model any probabilistic density functions (PDFs) byincreasing the number of Gaussian components so GMMhas strong encoding capacity Finally movements recogniz-ing predicting and generating can be integrated togetherwithin the same GMM and GMR encoding strategy So itis very favorable for us to use GMM encoding our priordemonstration information and use GMR as the solution tothe regression problem mentioned above Details of GMMtheory are described below

3 Gaussian Mixture Model

For a probability density estimation problem if in additionto the known observation variables there may exist some

unobservable latent variables but we know the distributionof the latent variable and the conditional distribution of theobservable variable given the latent variable then we can usemixture models to estimate the parameters of the distribu-tion so as to get the approximate probability distribution ofthe known data Before introducing GMM we deviate to talkabout Gaussian distribution

31 Marginal and Conditional Distribution of a MultivariateGaussian Distribution Assume that we have a random vari-able

119909 = [1199091

1199092

] (1)

in which 1199091isin R119903 119909

2isin R119904 119909 isin R119903+119904 119909 sim N(120583 Σ) where

120583 = [1205831

1205832

]

Σ = [Σ11

Σ12

Σ21

Σ22

]

(2)

And 1205831isin R119903 120583

2isin R119904 120583 isin R119903+119904 Σ

11isin R119903times119903 and Σ

12isin

R119903times119904 Noting that the covariance matrix is symmetric so wehave Σ

12= Σ21

119879Because the marginal distribution of a Gaussian is also

a Gaussian distribution it is easy to get that 119864[1199091] = 120583

1

cov(1199091) = Σ

11 119864[1199092] = 1205832 and cov(119909

2) = Σ

22 and we have

1199091sim N(120583

1 Σ11) and 119909

2sim N(120583

2 Σ22) As a step forward we

can prove 1199091| 1199092sim N(120583

1|2 Σ1|2

) where

1205831|2

= 1205831+ Σ12Σminus1

22(1199092minus 1205832)

Σ1|2

= Σ11

minus Σ12Σminus1

22Σ21

(3)

32 Formulation of GMM Assuming that we have a trainingset 119863 = 119909

(1) 119909(2)

119909(119898)

containing 119898 samples everysample 119909(119894) is a vectorWe are trying to find a density functionto describe the data with the least error which is equal tomaximizing the probability of the data In order to representthis probability we need to make some assumptions Firstwe assume that the latent variable 119911

(119894)sim Multinomial(120601) in

which 120601119895ge 0 sum

119896

119895=1120601119895= 1 besides we suppose 119909

(119894)| 119911(119894)

=

119895 sim N(120583119895 Σ119895) So we can model the data in our hands by

4 Journal of Robotics

Data acquisitiondemonstration and recording

Preprocessingdenoising invalid datadetection sequencealignment

Encoding and retrievalGMM GMR

Control policymetric of imitationoptimization

Robot

Training data What to imitate How to imitate

Odom data

Vision data

x 120596

d

120585j 120583k Σk

120585xs linear velocity

120585120579s angular velocity

Figure 2 System architecture of obstacle avoidance learning

adjusting parameters (120601 120583 Σ)The likelihood function can bewritten as

119897 (120601 120583 Σ) =

119898

sum

119894=1

log119901 (119909(119894) 120601 120583 Σ)

=

119898

sum

119894=1

log119896

sum

119911(119894)=1

119901 (119909(119894)

| 119911(119894) 120583 Σ) 119901 (119911

(119894) 120601)

(4)

Maximize (4) and then we can find the right modelparameters To realize this purpose typically we find thepartial derivatives for each of the parameters and set themequal to zero but actually we find that we cannot get a closed-loop solution for this problem using the above method Themain reason is that 119911 is unknown if 119911 is known the modeldegrades to a Gaussian discriminant model following thepreviously mentioned steps we can obtain a closed-loopsolutionWewill introduce a kind of commonly used iterativealgorithm to solve the problem in (4)

33 EM for GMM Expectation Maximization (EM) is aniterative algorithm mainly including two steps which arecalled E-step andM-stepThemain idea is to find a tight lowerbound function of the objective function and then maximizethe lower bound function so that the objective functioncan be maximized indirectly The reason for introducing alower bound function is that finding the extreme value ofthe lower bound function is much easier compared to theoriginal function For GMM the E-step tries to guess thevalue of 119911(119894) equivalent to finding out which components thecurrent sample belongs to and then the model parametersare updated based on our conjecture The algorithm can besummarized as follows

E-step for each 119894 and 119895 compute the poster-probability of119911(119894)

119908119894

119895fl 119901 (119911

(119894)= 119895 | 119909

(119894) 120601 120583 Σ) (5)

M-step modify our model parameters

120601119895fl

1

119898

119898

sum

119894=1

119908(119894)

119895

120583119895fl

sum119898

119894=1119908(119894)

119895119909(119894)

sum119898

119894=1119908(119894)

119895

Σ119895fl

sum119898

119894=1119908(119894)

119895(119909(119894)

minus 120583119895) (119909(119894)

minus 120583119895)119879

sum119898

119894=1119908(119894)

119895

(6)

Note that the value of 119908119894

119895can be computed using the

following equation

119908119894

119895fl 119901 (119911

(119894)= 119895 | 119909

(119894) 120601 120583 Σ)

=

119901 (119909(119894)

| 119911(119894)

= 119895 120583 Σ) 119901 (119911(119894)

= 119895 120601)

sum119896

119897=1119901 (119909(119894) | 119911(119894) = 119897 120583 Σ) 119901 (119911(119894) = 119897 120601)

(7)

iterating between (6) and (7) until convergence whichmeansthat the changes ofmodel parameters are less than a threshold120598 or the value of likelihood function changes in a very narrowrange

4 The Architecture of ObstacleAvoidance Learning

Figure 2 shows the architecture of obstacle avoidance learningand the information flow between different modules Thewhole system contains three modules data acquisition andpreprocessing constraints extraction and trajectory retrievaland optimal control strategy derivation Mileage informationis collected by built-in sensors such as odometers anddistance information is acquired by Kinect Through noisefiltering invalid data reduction and sequence alignment forthe original data we get the input for the learning machineThen the input signal is encoded by GMM and a general

Journal of Robotics 5

Start

GoalObstacle

x120579

120596

dro

drg

Figure 3 Useful data collected by sensors Line in blue showsthe demonstrated trajectory Sectors with a fan shape in shallowyellow color represent the visual field of the Kinect 119909 120596 119889ro

119889rg

represent location orientation and distance information of therobot respectively

trajectory is generated by GMR The general trajectorysatisfies all the constraints for the task Eventually a controlleris designed to follow the expected input trajectory and outputthe control commands to the robot Subsequent sections willdescribe each part in detail

41 DataAcquisition Theexperimental data is acquired formaTurtlebot robot Turtlebot is awidely used research platformwhich is compatible with ROS integrating a variety ofsensors such as odometer and extensible vision sensors Wecan develop many kinds of applications by using Turtlebotsuch as mapping and navigation In this paper we willimplement our obstacle avoidance algorithm on the bases ofROS development kit for TurtlebotThe teaching process canbe achieved by three ways (i) using a remote joystick (ii)with the aid of Turtlebot development kit for the navigationalgorithms simulating some trajectories as training samplesand (iii) by means of the following apps implemented onTurtlebot the teacher can guide robot to avoid obstacles andrecord sensor data as samples In this paper we adopt thesecond method which means that we create simulated envi-ronment in software and then make robot execute multipleobstacle avoidance tasks in the simulated environment usingbuilt-in navigation algorithms During this process linearvelocity and angular velocity are obtained by built-in sensorslike odometer Using Kinect we can get the distance betweenthe robot and obstacles as well as distance from the robot tothe target

We assume that the important sensing information comesfrom (i) states of the robot defined as linear velocity andangular velocity (ii) the robotrsquos absolute location (iii) dis-tance between the robot and obstacles noting that the Kinectreturns many pieces of distance information representingthe relative position of the obstacle in a specific range andwe choose the shortest distance as our reference value and(iv) distance between the robot and the target which canbe computed indirectly As illustrated in Figure 3 we use

(119909 120596 119889ro 119889

rg) to represent the sensor informationmentioned

above respectively Here the location information and veloc-ity information are redundantWe think it is necessary to takeboth kinds of information into consideration for the reasonthat location information is very important and direct foran obstacle avoidance task In addition choosing variablesrelating to the task as many as possible is generally verynecessary in the initial stage of a task

In the experiment demonstration is conducted 119899 timesIn each demonstration 119879 data points are recorded so thedataset contains 119873 = 119899 times 119879 points in total Each samplepoint is a collection of variables represented as119863 = 119909 120596 119889in which 119909 stands for the location of robot 120596 stands for thecurrent orientation and 119889 stands for distance Each variablecontains a one-dimensional temporal value and (119889 minus 1)-dimensional spatial values that is 119909 = 119909

119904 119909119905 120596 = 120596

119904 120596119905

119889ro

= 119889ro119904 119889

ro119905 and 119889

rg= 119889

rg119904 119889

rg119905 119909119904isin R119873times2 represent

location 120596 isin R represent orientation 119889ro119904

isin R119873times2 representdistance between robot and obstacles and 119889

rg119904

isin R119873times2

represent distance between robot and goal state Supposingcoordinates of the goal are 119900

119904119894 then we have

119889rg119904

= 119909119904119894119895

minus 119900119904119894 forall119894 isin 1 119899 forall119895 isin 1 119879 (8)

where 119894 indicates the trajectory number and 119895 indicatestime in a trajectory We can rewrite the above equation forvelocities

rg119904

= 119904119894119895

minus 119904119894 (9)

42 Data Preprocessing Because of noise and bad demon-stration quality there exist seriousmismatch and invalid datain the original data As shown in Figure 4 there is no changeat the beginning and end parts of these curves so this partof data is useless To reduce computation cost it is a wisechoice to abandon these data In addition each trajectory hasdifferent lengthwe should resample them to a uniform lengthso that we can handle them easier in the subsequent stepsSo before inputting the data to our algorithms preprocessingsteps including noise filter useless data reduction and resam-pling are conducted Figure 4 displays differences between theoriginal data and the processed data

43 Trajectory Encoding and Retrieval Considering a data-point in training set 120585

119895= 120585119905119895 120585119904119895119873

119895=1isin 119863 120585

119895is a119863-dimensi-

onal vector We encode the data with GMM supposing wehave 119870 Gaussian components The probability items in (4)can be represented as

119901 (119896) = 120587119896

119901 (120585119895| 119896) = N (120585

119895 120583119896 Σ119896)

=1

radic(2120587)119863 1003816100381610038161003816Σ119896

1003816100381610038161003816

119890minus(12)((120585119895minus120583119896)

119879summinus1119896 (120585119895minus120583119896))

(10)

in which 120587119896 120583119896 Σ119896 are model parameters representing

prior-probability mean value and variance respectively

6 Journal of Robotics

0

2

4

6x

(m)

minus1

0

1

120596(r

ads

)

0

2

4

dro

(m)

0

2

4

drg

(m)

100 2000t (s)

100 2000t (s)

100 2000t (s)

100 2000t (s)

(a)

0

2

4

6

x(m

)

50 100 150 2000t (s)

minus1

0

1

120596(r

ads

)

50 100 150 2000t (s)

0

2

4

dro

(m)

50 100 150 2000t (s)

0

2

4

drg

(m)

50 100 150 2000t (s)

(b)

Figure 4 Comparison of original data and processed data (a) shows the original data Lines in red represent cutting position (b) displaysvalid data after processing

To solve these parameters the EM algorithm introducedin the previous section can be used When EM algorithmconverges to a given value we get the right 120587

119896 120583119896 Σ119896 that

is we encode the trajectory data successfullyIn order to get a general barrier-free trajectory which

is given an input which can be temporal signals or otherkinds of sensor data then a barrier-free path is generatedaccording to the given input a natural idea is to divide thevariable 120585 into two parts (120585I

119905 120585

O119905) which represents input and

output respectively and then trajectory retrieval problemcanbe formulized as follows given 120585 sim sum

119870

119894=1120587119894N(120583119894 Σ119894) and 120585

I119905

solve the density probability distribution for 120585O119905 Solution to

this problem is called Gaussian mixture regression Regard-ing 120585

I119905as query points 120585O

119905can be estimated using regression

methods In this paper we use superscriptsI andO standingfor the input and output part for each variable With thisnotation a block decomposition of the data-point 120585

119905 vectors

120583119896 and matrices Σ

119905can be written as

120585119905= [

120585I119896

120585O119896

]

120583119896= [

120583I119896

120583O119896

]

Σ119896= [

ΣI119896

ΣIO119896

ΣOI119896

ΣO119896

]

(11)

Based on (3) for a time step 119905 the mean vector andcovariance matrix for the 119896th component are

O119896(120585

I119905) = 120583

O119896+ Σ

OI119896

ΣIminus1119896

(120585I119905

minus 120583I119896)

ΣO

119896= Σ

O119896minus Σ

OI119896

ΣIminus1119896

ΣIO119896

(12)

Taking 119870 components into consideration the mean vec-tor and covariance matrix for distribution 120585

O119905given 120585

I119905are

119875 (120585O119905| 120585

I119905) sim

119870

sum

119896=1

ℎ119896(120585

I119905)N (

O119896(120585

I119905) Σ

O

119896) (13)

Journal of Robotics 7

where

ℎ119896(120585

I119905) =

120587119894N (120585

I119905

| 120583I119894 Σ

I119894)

sum119870

119896=1120587119896N (120585

I119905

| 120583I119896 Σ

I119896)

(14)

Equation (13) represents a multimodal distributionwhich can be approximated by the following Gaussian dis-tribution

119875 (120585O119905| 120585

I119905) = N (

O119905 Σ

O

119905) with

120583O119905=

119870

sum

119894=1

ℎ119894(120585

I119905) 120583

O119894(120585

I119905)

ΣO

119905=

119870

sum

119894=1

ℎ119894(120585

I119905) (Σ

O

119894+

O119894(120585

I119905)

O119894(120585

I119905)119879

)

minus O119905O119879119905

(15)

Substituting 120585I119905into 120585

O119905using equations described above

a general trajectory is obtained The trajectory not only canavoid obstacles but also can be executed by robots due to itsgood smooth property

44 Finding an Optimum Control Policy Through the previ-ous section we can obtain a general trajectory and this is aplanning problem How to combine planning with controlIn other words how to find a set of control inputs that canmake the robot follow the general trajectory accurately Forthe time-based GMR 120585O

119905is equal to 120585

119904 and 119904 represents

spatial location information Using 119904represents the desired

trajectory and 120585119904represents the actual reached location In

order to evaluate the tracking performance a simple idea isusing the weighted Euclidean distance

119863minus1

sum

119894=1

120596119894(120585119904119894

minus 119904119894)2

= (120585119904minus 119904)119879

119882(120585119904minus 119904) (16)

Naturally the above equation can be applied to mul-tiple variables For this paper we have four variables Let119909

120596

119889ro

119889rg

represent the desired trajectory the actualtracking performance 119875 can be expressed as

119875 = (119909

119904minus 119862119909)119879

119882119909(119909

119904minus 119862119909)

+ (120596

119904minus 119862120596)119879

119882120596(120596

119904minus 119862120596)

+ (119889ro

119904minus 119862119889ro)

119879

119882119889ro(119889ro

119904minus 119862119889ro)

+ (119889rg

119904minus 119862119889rg)

119879

119882119889rg(119889rg

119904minus 119862119889rg)

(17)

where

119862lowast119894119895

= lowast

119904119894119895minus 120585lowast

119904119894119895minus1

forall119894 isin 1 119899 forall119895 isin 1 119879 foralllowast isin 119909 120596 119889ro 119889

rg

(18)

According to performance indicator 119875 finding a desiredcontrol policy is equivalent to maximizing 119875 Since 119875 isa quadratic function an analytical solution can be easilyobtained by gradient method But noting that there existconstraints between variables as shown in (9) maximizing(17) turns into a constrained optimization problem Usuallythis kind of problem can be solved by constructing a Lagrangefunction defined as follows

119871 (119909

119904 120596

119904 119889ro

119904 119889rg

119904 120582) = 119875 + 120582

119879(119909

119904minus 119889rg

119904) (19)

where120582 is Lagrangemultiplier Lettingnabla119871 = 0 solving partialderivatives for

119909

119904 119889rg

119904 we find

2119882119909(119909

119904minus 119862119909) + 120582 = 0

2119882119889rg(119889rg

119904minus 119862119889rg) minus 120582 = 0

(20)

Adding (20) we find

2119882119889rg(119889rg

119904minus 119862119889rg) + 2119882

119909(119909

119904minus 119862119909) = 0 (21)

Substituting 119909

119904= minus119889rg

119904into the above equation we find

119909

119904= (119882

119909minus 119882119889rg)minus1

(119882119909119862119909+ 119882119889rg119862119889rg) (22)

Weight matrices in (22) can be computed by solving theinverse matrix of the current covariance matrix namely

119882 = (ΣO119905)minus1

(23)

5 Simulation Experiments

Experiments are implemented by using a Turtlebot robotIn addition to the robotrsquos own hardware system a Kinectis integrated to obtain distance information The size of themap used in the simulation is 75 times 5m Two different mapsare created to verify the learning effects For the first mapthe obstacle locates in the middle of the map and a smallgap lies in the lower side and the robot can reach the targetplace through the gap The second map is much similar tothe first one except that there exist two obstacles in the mapTo get the destination the robot must avoid every obstacle inthe map Figure 5 shows the shape of the map Figures 5(a)and 5(b) represent the 2D map and Figures 5(c) and 5(d)show a 3D simulation environment which corresponds tothe 2D maps The color maps in maps (a) and (b) stand forcost cloud which provides reference for path planning Thewarm color such as yellow and red stands for a lower costTo the contrary the cool color tune stands for a higher cost Apath planer always tends to choose a trajectory with the leastcostThe cost value is calculated by built-in functions in ROSpackage For the convenience of description we use task 1 andtask 2 to represent the two different obstacle avoidance tasksin maps (a) and (b) It is worth mentioning that we select rel-atively simple environment for both tasks for the reason that

8 Journal of Robotics

StartGoal

x

y

(a)

x

y

(b)

(c) (d)

Figure 5 Twodifferentmaps used in our simulation (a) and (b) are actualmaps generated by software (c) and (d) are simulation environmentcorresponding to maps above them

4 62x (m)

0

1

2

3

4

5

y(m

)

(a)

4 62x (m)

0

1

2

3

4

5

y(m

)

(b)

Figure 6 Demonstrated trajectories in two situations (a) represents sampled data in task 1 (b) represents sampled data in task 2

we putmore emphasis on themechanism behind the obstacleavoidance whichmeans we can learn skills from simple envi-ronment but these skills can be reused in complex situations

Figure 6 shows the demonstrated trajectories in twodifferent situations Coordinates of goal states are set to (6 3)

and (6 2) for task 1 and task 2 respectively The start pointsfor task 1 are randomly generated by sampling from uniformdistributions between 05 and 25 for 119883-coordinates and 2

to 4 for 119884-coordinates The maximum range is 2m whichcan guarantee a good generalization performance for obstacleavoidance For task 2 the start points are generated by sam-pling from uniform distributions from 05 to 2 and 2 to 35

51 Experiments for Trajectory Encoding and Retrieval Fig-ure 7 shows the encoding and retrieval results for task 1 Theencoded variables are location information 119909 and distance

Journal of Robotics 9

GMM

0

1

2

3

4

y(m

)

4 62

x (m)

minus1

0

1

2

3

drg y

(m)

1 2 3 40

drgx (m)

(a)

GMR

0

1

2

3

4

y(m

)

4 62

x (m)

minus1

0

1

2

3

drg y

(m)

2 40

drgx (m)

(b)

Figure 7 Trajectory encoding and retrieval using GMM and GMR for task 1 (a) represents model encoding with GMM Ellipse in dark bluerepresents projections of Gaussian components (b) represents general trajectory generated by GMR Shades in red color represent generationregion

information 119889rg and both variables are 2-dimensional vec-

tors Number of Gaussian components is set to 6 It isworth noting that the optimal number of components 119870

in a model may not be known beforehand so far Acommon method consists of estimating multiple modelswith increasing number of components and selecting anoptimum based on some model selection criterion suchas Cross Validation Bayesian Information Criterion (BIC)or Minimum Description Length (MDL) In our paper wedecide 119870 by using Cross Validation criterion As the figuresshow variance for starting position is large corresponding

to a big retrieval error It indicates that there are no strictrequirements for starting position Variance for end point ispretty small which indicates that the robot can get the goalstate accurately Thus they show that GMM can successfullyextract useful constraints for characterization task GMR canretrieve barrier-free trajectory by using information encodedby GMM as shown in the second column of Figure 5 Figuresin the second row show encoding and retrieval results forvariable 119889

rg As can be seen from the figures 119889rg shares

similar distribution properties with 119909This is because the twovariables have a specific relationship between each other so

10 Journal of Robotics

minus1

0

1120596

(rad

s)

10 200

t (s)

minus1

0

1

120596(r

ads

)

10 200

t (s)

10 200

t (s)

0

5

dro

(m)

10 200

t (s)

0

2

4

dro

(m)

10 200

t (s)

minus5

0

5

drg x

(m)

minus5

0

5

drg x

(m)

10 200

t (s)

minus5

0

5

10 200

t (s)10 200

t (s)

minus5

0

5

drg y

(m)

drg y

(m)

Figure 8 Encoding and retrieval for orientation and distance information in task 1 sharing the same meaning with charts in Figure 7

domean value and covariance which verify the validity of themethod indirectly

Figure 8 shows encoding results for orientation and dis-tance information Figures in the last two rows displayencoding for 119889rg in 119883 and 119884 direction As we see coordinatevalues in119883 directionmonotone decrease to 0This is becausethe robot keeps moving from left to right in the map whilecoordinates in 119884 direction keep going up at the beginningand then drop to zero This is because the robot encountersobstacles in the middle moment in order to avoid obstacles

the robot has to go away from the goal state temporarily Payattention to the final distance value which is close to (0 0)which means the robot reaches the target successfully

Figures 9 and 10 display encoding and retrieval resultsfor task 2 Compared with task 1 task 2 is more complicatedconsidering the law behind the data In order to encode theselaws more Gaussian components are required In our exper-iments 119870 is set to 10 by using the method described in task1 As illustrated in these figures no matter how complex theenvironment is this method still has very satisfactory effect

Journal of Robotics 11

GMM GMR

1

2

3

4

y(m

)

3 4 5 62

x (m)

1

2

3

4

y(m

)

3 4 5 62

x (m)

minus2

minus1

0

1

drg y

(m)

10 3 42

drgx (m)

10 3 42

drgx (m)

minus2

minus1

0

1

drg y

(m)

Figure 9 Trajectory encoding and retrieval using GMM and GMR for variables 119909 and 119889 in task 2

52 Experiments for Finding Control Strategy Experiments inthis part are mainly conducted to verify the derived controlpolicy For a Turtlebot robot the control input is linearvelocity

119909

119904and angular velocity

120579

119904 Assuming we have a

desired trajectory (the trajectory can be given arbitrarily inour experiments in order to find a reference we choose thesecond trajectory in our samples) using theories derived inprevious sections we can compute

119909

119904and

120579

119904 Iterating on

every time we can get a control sequence Comparing thecalculated value with the sensor recorded value we can verifythe validity of the derived results

Figure 12 compares the calculated linear velocity andangular velocity with the desired value Linear velocity isdivided into velocity in 119883 direction and 119884 direction Asthe figure in the second row shows expected velocity in 119884

direction remains close to 0This is because the control inputsfor Turtlebot are a linear velocity in the forward directionthat is 119883 direction and an angular velocity The line graphfor velocity 119884 reveals that there is a minor fluctuation at thebeginning and end stage while the overall trend is consistentwith the fact Velocity 119909 follows the expected value perfectlyConsidering Figure 12(b) it can be seen that the calculatedvalue remains consistent with the actual one It is worth

noting that the length of the calculated sequence differs fromthe actual one since the frequency of the readings fromrobotrsquos sensors is lower than the frequency of commandsIn this case to compare the similarity of the two signalssequence alignment is an indispensable step In this paper weuse dynamic time warping (DTW) to achieve this purposeSignals after alignments show a high correlation whichmeans the calculated values are correct

Figure 11 shows the location information obtained byintegrating velocity signals The calculated control inputsallow the robot to follow the given trajectory accuratelyexcept a minor fluctuation at the beginning time

6 Conclusion

In this paper based on the idea of the imitation learning wepropose a novel idea that the robotrsquos obstacle avoidance skillcan be learned by showing the robot how to avoid an obstacleUnder this consideration task constraints extraction is a keyissue that is how to make robots understand the task Inorder to achieve this purpose the Gaussian mixture model isused to encode the data GMR is used to retrieve satisfactory

12 Journal of Robotics

minus2

0

2120596

(rad

s)

10 20 30 400

t (s)

minus2

0

2

120596(r

ads

)

10 20 30 400

t (s)

0

2

4

dro

(m)

10 20 30 400

t (s)10 20 30 400

t (s)

0

2

4

dro

(m)

minus5

0

5

drg x

(m)

10 20 30 400

t (s)

minus5

0

5

drg x

(m)

10 20 30 400

t (s)

minus5

0

5

10 20 30 400

t (s)10 20 30 400

t (s)

minus5

0

5

drg y

(m)

drg y

(m)

Figure 10 Encoding and retrieval for orientation and distance information in task 2

Expected pathActual path

2 3 4 5 6 71x (m)

05

1

15

2

25

3

35

4

y(m

)

Figure 11 Tracking performance using the derived policy

Journal of Robotics 13

Expected valueCalculated value

50 100 150 2000t (s)

50 100 150 2000t (s)

minus202

Vx

(ms

)

minus101

Vy

(ms

)

(a) Linear velocity

Expected valueCalculated value

50 100 150 2000t (s)

minus202

120596(r

ads

)

(b) Angular velocity

Figure 12 Comparisons between the expected angular velocity andcalculated value

trajectories Eventually to obtain the control strategy a mul-tiobjective optimization problem is constructed The controlstrategy is derived by solving this problem Simulation exper-iments verified our algorithms

For the future improvement work mainly includes thefollowing aspects (i) in order to avoid obstacles onlineonline incremental GMM algorithm should be used (ii) inthis paper a time-based trajectory is generated by GMRthat is use of time signal as input A more reasonable ideais using state information as input so that the robot canchange its decision according to the current environmentfor example using distance information as input and velocityas output thus the control inputs can be directly generated(iii) reinforcement learning algorithms can be integrated tostrengthen robotsrsquo exploration ability so that better general-ization performance is achieved

Competing Interests

The authors declare that the grant scholarship andor fund-ing do not lead to any conflict of interests Additionally theauthors declare that there is no conflict of interests regardingthe publication of this manuscript

Acknowledgments

This work is supported by National Natural Science Founda-tion of China (Grant no 51505470) and the Dr Startup Fundin Liaoning province (20141152)

References

[1] E Masehian and D Sedighizadeh ldquoClassic and heuristicapproaches in robotmotion planningmdasha chronological reviewrdquo

World Academy of Science Engineering and Technology vol 23pp 101ndash106 2007

[2] P Raja and S Pugazhenthi ldquoOptimal path planning of mobilerobots a reviewrdquo International Journal of Physical Sciences vol7 no 9 pp 1314ndash1320 2012

[3] J C Latombe Robot Motion Planning Springer 2012[4] J Borenstein and Y Koren ldquoReal-time obstacle avoidance for

fast mobile robotsrdquo IEEE Transactions on Systems Man andCybernetics vol 19 no 5 pp 1179ndash1187 1989

[5] O Khatib ldquoReal-time obstacle avoidance for manipulators andmobile robotsrdquo International Journal of Robotics Research vol5 no 1 pp 90ndash98 1986

[6] H Mei Y Tian and L Zu ldquoA hybrid ant colony optimizationalgorithm for path planning of robot in dynamic environmentrdquoInternational Journal of Information Technology vol 12 no 3pp 78ndash88 2006

[7] M A P Garcia O Montiel O Castillo R Sepulveda and PMelin ldquoPath planning for autonomous mobile robot navigationwith ant colony optimization and fuzzy cost function evalua-tionrdquo Applied Soft Computing vol 9 no 3 pp 1102ndash1110 2009

[8] C Lopez-Franco J Zepeda N Arana-Daniel and L Lopez-Franco ldquoObstacle avoidance using PSOrdquo in Proceedings of the9th International Conference on Electrical Engineering Comput-ing Science and Automatic Control (CCE rsquo12) pp 1ndash6 MexicoCity Mexico September 2012

[9] H-QMin J-H Zhu andX-J Zheng ldquoObstacle avoidancewithmulti-objective optimization by PSO in dynamic environmentrdquoin Proceedings of the International Conference on MachineLearning and Cybernetics (ICMLC rsquo05) vol 5 pp 2950ndash2956IEEE Guangzhou China August 2005

[10] S Dutta ldquoObstacle avoidance of mobile robot using PSO-basedneuro fuzzy techniquerdquo International Journal of ComputerScience and Engineering vol 2 no 2 pp 301ndash304 2010

[11] W-G Han S-M Baek and T-Y Kuc ldquoGenetic algorithmbased path planning and dynamic obstacle avoidance of mobilerobotsrdquo in Proceedings of the IEEE International Conference onSystems Man and Cybernetics vol 3 pp 2747ndash2751 OrlandoFla USA October 1997

[12] J Tu and S X Yang ldquoGenetic algorithm based path planningfor a mobile robotrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo03) vol 1 pp1221ndash1226 September 2003

[13] Y Zhang and J Wang ldquoObstacle avoidance for kinematicallyredundant manipulators using a dual neural networkrdquo IEEETransactions on Systems Man and Cybernetics Part B Cyber-netics vol 34 no 1 pp 752ndash759 2004

[14] X Wang V Yadav and S N Balakrishnan ldquoCooperativeUAV formation flying with obstaclecollision avoidancerdquo IEEETransactions on Control Systems Technology vol 15 no 4 pp672ndash679 2007

[15] J Godjevac ldquoA learning procedure for a fuzzy system applica-tion to obstacle avoidancerdquo in Proceedings of the InternationalSymposium on Fuzzy Logic pp 142ndash148 Zurich Switzerland1995

[16] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press Boston Mass USA 1998

[17] J Michels A Saxena and A Y Ng ldquoHigh speed obstacleavoidance usingmonocular vision and reinforcement learningrdquoin Proceedings of the 22nd International Conference on MachineLearning (ICML rsquo05) pp 593ndash600 ACMBonnGermany 2005

14 Journal of Robotics

[18] C Ye N H C Yung and D Wang ldquoA fuzzy controller withsupervised learning assisted reinforcement learning algorithmfor obstacle avoidancerdquo IEEETransactions on SystemsMan andCybernetics Part B Cybernetics vol 33 no 1 pp 17ndash27 2003

[19] B-Q Huang G-Y Cao and M Guo ldquoReinforcement learningneural network to the problem of autonomous mobile robotobstacle avoidancerdquo in Proceedings of the International Confer-ence on Machine Learning and Cybernetics (ICMLC rsquo05) vol 1pp 85ndash89 August 2005

[20] C F Touzet ldquoNeural reinforcement learning for behavioursynthesisrdquo Robotics and Autonomous Systems vol 22 no 3-4pp 251ndash281 1997

[21] W D Smart and L P Kaelbling ldquoEffective reinforcement learn-ing for mobile robotsrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo02) vol 4 pp3404ndash3410 Washington DC USA May 2002

[22] K Macek I Petrovic and N Peric ldquoA reinforcement learningapproach to obstacle avoidance ofmobile robotsrdquo inProceedingsof the 7th International Workshop on Advanced Motion Control(AMC rsquo02) pp 462ndash466 IEEE Maribor Slovenia July 2002

[23] U Muller J Ben E Cosatto et al ldquoOff-road obstacle avoidancethrough end-to-end learningrdquo in Advances in Neural Informa-tion Processing Systems pp 739ndash746 2005

[24] B D Argall S Chernova M Veloso and B Browning ldquoAsurvey of robot learning from demonstrationrdquo Robotics andAutonomous Systems vol 57 no 5 pp 469ndash483 2009

[25] C L Nehaniv and K Dautenhahn ldquoLike me-measures ofcorrespondence and imitationrdquo Cybernetics amp Systems vol 32no 1-2 pp 11ndash51 2001

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 2: Research Article Robot Obstacle Avoidance Learning Based ...Research Article Robot Obstacle Avoidance Learning Based on Mixture Models HuiwenZhang, 1,2 XiaoningHan, 1,2 MingliangFu,

2 Journal of Robotics

a geometric problem is turned into a mathematical problemthe best obstacle-free path is a solution for the inequitieswhich can be solved by optimization methods All thesemethods have been successfully used but the traditionalmethod generally is only applicable to the static environmentand easy to fall into a local optimal solution To solve theseproblems more and more intelligent approaches are used By2003 more than half of articles use heuristic algorithms tosolve obstacle avoidance problem [1]

The core idea of heuristic algorithms is to simulate someintelligent behaviors of animals or people Algorithms aredesigned by bionic mechanism There are a large number ofrepresentative algorithms such as Ant Colony Optimization(ACO) algorithm Particle SwarmOptimization (PSO) Algo-rithm and Genetic Algorithm (GA) In [6] ACO is used asa global path planning method to avoid obtaining a localoptimal solution when using potential field method In [7]distance and memory feature are introduced to speed up thecomputation of the ACO The emergence of PSO derivedfrom the simulation of birdsmigrating behavior Applicationsof particle swarm optimization to solve the path planningproblem include [8ndash10] GA derived from the theory ofevolution and genetics is a kind of global probabilisticoptimization algorithm In [11] a special encoding technologyis used to speed up the path generation The fitness functionis designed according to security and distance relative to atarget and then a barrier-free path is solved through a geneticalgorithm In [12] a variable-length chromosome geneticalgorithm is designed for path planning In [13] the authorsuse artificial network to solve the obstacle avoidance problemof a redundant manipulator An improved Grossberg neuralnetwork method is used in [14] to solve the UAV pathplanning under a ldquodanger moderdquo Neural network can alsorealize the obstacle avoidance in conjunction with othermethods such as fuzzy neural network learning [15] Amongall the intelligent algorithms there are broad categoricalapproaches which are based on the learning A representativework is using reinforcement learning Reinforcement learn-ing derives from the psychology The main idea is throughconstant interaction with the environment and ldquotrial anderrorrdquo to maximize the reward defined by a reward function[16] Based on this concept a lot of obstacle avoidancealgorithms are proposed [17ndash22] It is worth noting thatwith a great improvement of hardware and the developmentof deep learning theory some methods can directly learnobstacle avoidance strategy from the sensor data called end-to-end learning [23] Intelligent algorithms usually can obtainthe global optimal solution and have good ability to adaptto the environment But often these methods bear highcomputational complexity and bad real-time performance

In this paper we proposed a kind of obstacle avoidancemethod based on teaching and learning First learning sam-ple is obtained by showing robots how to avoid the obstacleand thenusing probabilisticmixturemodel for extraction andencoding task constraints autonomously Finally obstacle-free path is generated by GMR Our approach differs fromprevious work in the following (i) it is more in line with thehabit of human cognition drawing lessons from imitationlearning and (ii) the algorithm has good generalization

ability and can adapt to dynamic unstructured environmentBesides the computational complexity is low and in par-ticular the path generation process can be realized onlineefficiently (iii) the learning framework not only can be usedto learn obstacle avoidance but also can be used for learningother skills which is a key point to achieve generalizedartificial intelligence (GAI)

Themain contributions of the paper are as follows (i) thefirst contribution is using LfD for the obstacle avoidance taskfor the first time making up for the disadvantages of othermethods (ii) due to the nature of LfD nontechnical peoplecan programme the robots to realize obstacle avoidance func-tion (iii) obstacle avoidance learning using mixture modelsis able to adapt the environmental changes and dynamicobstacles and at the same time shares low computationalcomplexity

The remainder of the paper is organized as followsIn Section 2 some background knowledge is introducedincluding the concepts of LfD and GMM Section 3 intro-duces parameters solving methods for GMM Section 4 givesa detailed introduction of the obstacle avoidance learningarchitecture In Section 5 simulation experiments based on aTurtlebot robot are conducted In the end conclusion ismadeand further improvements are introduced

2 Learning from Demonstration

LfD is a paradigm which allows robots to automaticallyacquire new skills Unlike the traditional robots to fulfill atask a previous task decomposition work and programmingfor each action need to be done by humans LfD adopts thepoint that robotrsquos skills can be acquired by observing thehuman operationThis approach has the advantages of simpleprogramming and even people who do not understandprogramming technique can programme robots throughteaching Besides due to the nature of LfD robots are moreadvantageous to adapt to new environment LfD is alsonamed robot Programming by Demonstration (PbD) (Imi-tation Learning and Apprenticeship Learning) Learning byDemonstration (LbD) Learning by Showing Learning byWatching Learning from Observation and so forth It isworth noting that some authors think that imitation learningand LfD have different meanings in [24] the authors distin-guish imitation learning and LfD according to how learningdata is recorded and how the imitator uses the data in thelearning process

21 Formulation of LfD Problem Solving process for LfD canbe divided into two steps First acquire training data seconduse the obtained data to derive a control policy We use 119878

and 119860 to represent state set and action set respectively Theprobability distribution function 119879(119904

1015840| 119904 119886) 119878 times 119860 times 119878 rarr

[0 1] represents the transfer probability from state (119904 119886)

to state 1199041015840 An experience dataset 119863 should include this

information Actually in most situations the true states 119878 areunknown the observable states are 119885 and then a learningproblem can be defined as deriving a policy 120587mapping form119885 to 119860 For an obstacle avoidance task the problem can

Journal of Robotics 3

Demonstration Policy derivation Robot systemDataset

Learning form demonstration Task execution

(S A T)

120587

Figure 1 Framework of learning form demonstration (LfD)

be described as finding the more general barrier-free path(policy) from the demonstrated barrier-free path (sample119863)Themain process for LfD framework is illustrated in Figure 1

22 Main Research Topics in LfD Nehaniv and Dautenhahn[25] summarized themain problems in LfD as four issues (1)What to imitate (2) How to imitate (3) When to imitate(4) Whom to imitate So far only the former two problemsare solved For the problem of robot obstacle avoidancewhat to imitate means the robot needs to extract the rightconstraints which can characterize the task and problem ofhow to imitate refers to generating a trajectory that satisfiesthe constraints obtained in the previous step

In order to solve the above problems (mainly the formertwo problems) there are three main ideas

(i) Directly get a map function 120587 = 119891(119911) from trainingset 119863 = (119911

119894 119886119894) where 119911 stands for the observed

state This idea treats the problem as supervisedlearning problem

(ii) Learn a system model namely the return functionand state transition probability distribution formtraining set and then derive a policy using MDPmethods

(iii) Learn the pre- or postcondition of each action andthen obtain control sequence using programmingmethods

For the obstacle avoidance problem through learning thebarrier-free path from teachers to produce a generalized pathcan be seen as a supervised regression problem GMM is aprobabilistic model which is suitable to model the naturalvariations in human and robot motionsTheoretically GMMcan model any probabilistic density functions (PDFs) byincreasing the number of Gaussian components so GMMhas strong encoding capacity Finally movements recogniz-ing predicting and generating can be integrated togetherwithin the same GMM and GMR encoding strategy So itis very favorable for us to use GMM encoding our priordemonstration information and use GMR as the solution tothe regression problem mentioned above Details of GMMtheory are described below

3 Gaussian Mixture Model

For a probability density estimation problem if in additionto the known observation variables there may exist some

unobservable latent variables but we know the distributionof the latent variable and the conditional distribution of theobservable variable given the latent variable then we can usemixture models to estimate the parameters of the distribu-tion so as to get the approximate probability distribution ofthe known data Before introducing GMM we deviate to talkabout Gaussian distribution

31 Marginal and Conditional Distribution of a MultivariateGaussian Distribution Assume that we have a random vari-able

119909 = [1199091

1199092

] (1)

in which 1199091isin R119903 119909

2isin R119904 119909 isin R119903+119904 119909 sim N(120583 Σ) where

120583 = [1205831

1205832

]

Σ = [Σ11

Σ12

Σ21

Σ22

]

(2)

And 1205831isin R119903 120583

2isin R119904 120583 isin R119903+119904 Σ

11isin R119903times119903 and Σ

12isin

R119903times119904 Noting that the covariance matrix is symmetric so wehave Σ

12= Σ21

119879Because the marginal distribution of a Gaussian is also

a Gaussian distribution it is easy to get that 119864[1199091] = 120583

1

cov(1199091) = Σ

11 119864[1199092] = 1205832 and cov(119909

2) = Σ

22 and we have

1199091sim N(120583

1 Σ11) and 119909

2sim N(120583

2 Σ22) As a step forward we

can prove 1199091| 1199092sim N(120583

1|2 Σ1|2

) where

1205831|2

= 1205831+ Σ12Σminus1

22(1199092minus 1205832)

Σ1|2

= Σ11

minus Σ12Σminus1

22Σ21

(3)

32 Formulation of GMM Assuming that we have a trainingset 119863 = 119909

(1) 119909(2)

119909(119898)

containing 119898 samples everysample 119909(119894) is a vectorWe are trying to find a density functionto describe the data with the least error which is equal tomaximizing the probability of the data In order to representthis probability we need to make some assumptions Firstwe assume that the latent variable 119911

(119894)sim Multinomial(120601) in

which 120601119895ge 0 sum

119896

119895=1120601119895= 1 besides we suppose 119909

(119894)| 119911(119894)

=

119895 sim N(120583119895 Σ119895) So we can model the data in our hands by

4 Journal of Robotics

Data acquisitiondemonstration and recording

Preprocessingdenoising invalid datadetection sequencealignment

Encoding and retrievalGMM GMR

Control policymetric of imitationoptimization

Robot

Training data What to imitate How to imitate

Odom data

Vision data

x 120596

d

120585j 120583k Σk

120585xs linear velocity

120585120579s angular velocity

Figure 2 System architecture of obstacle avoidance learning

adjusting parameters (120601 120583 Σ)The likelihood function can bewritten as

119897 (120601 120583 Σ) =

119898

sum

119894=1

log119901 (119909(119894) 120601 120583 Σ)

=

119898

sum

119894=1

log119896

sum

119911(119894)=1

119901 (119909(119894)

| 119911(119894) 120583 Σ) 119901 (119911

(119894) 120601)

(4)

Maximize (4) and then we can find the right modelparameters To realize this purpose typically we find thepartial derivatives for each of the parameters and set themequal to zero but actually we find that we cannot get a closed-loop solution for this problem using the above method Themain reason is that 119911 is unknown if 119911 is known the modeldegrades to a Gaussian discriminant model following thepreviously mentioned steps we can obtain a closed-loopsolutionWewill introduce a kind of commonly used iterativealgorithm to solve the problem in (4)

33 EM for GMM Expectation Maximization (EM) is aniterative algorithm mainly including two steps which arecalled E-step andM-stepThemain idea is to find a tight lowerbound function of the objective function and then maximizethe lower bound function so that the objective functioncan be maximized indirectly The reason for introducing alower bound function is that finding the extreme value ofthe lower bound function is much easier compared to theoriginal function For GMM the E-step tries to guess thevalue of 119911(119894) equivalent to finding out which components thecurrent sample belongs to and then the model parametersare updated based on our conjecture The algorithm can besummarized as follows

E-step for each 119894 and 119895 compute the poster-probability of119911(119894)

119908119894

119895fl 119901 (119911

(119894)= 119895 | 119909

(119894) 120601 120583 Σ) (5)

M-step modify our model parameters

120601119895fl

1

119898

119898

sum

119894=1

119908(119894)

119895

120583119895fl

sum119898

119894=1119908(119894)

119895119909(119894)

sum119898

119894=1119908(119894)

119895

Σ119895fl

sum119898

119894=1119908(119894)

119895(119909(119894)

minus 120583119895) (119909(119894)

minus 120583119895)119879

sum119898

119894=1119908(119894)

119895

(6)

Note that the value of 119908119894

119895can be computed using the

following equation

119908119894

119895fl 119901 (119911

(119894)= 119895 | 119909

(119894) 120601 120583 Σ)

=

119901 (119909(119894)

| 119911(119894)

= 119895 120583 Σ) 119901 (119911(119894)

= 119895 120601)

sum119896

119897=1119901 (119909(119894) | 119911(119894) = 119897 120583 Σ) 119901 (119911(119894) = 119897 120601)

(7)

iterating between (6) and (7) until convergence whichmeansthat the changes ofmodel parameters are less than a threshold120598 or the value of likelihood function changes in a very narrowrange

4 The Architecture of ObstacleAvoidance Learning

Figure 2 shows the architecture of obstacle avoidance learningand the information flow between different modules Thewhole system contains three modules data acquisition andpreprocessing constraints extraction and trajectory retrievaland optimal control strategy derivation Mileage informationis collected by built-in sensors such as odometers anddistance information is acquired by Kinect Through noisefiltering invalid data reduction and sequence alignment forthe original data we get the input for the learning machineThen the input signal is encoded by GMM and a general

Journal of Robotics 5

Start

GoalObstacle

x120579

120596

dro

drg

Figure 3 Useful data collected by sensors Line in blue showsthe demonstrated trajectory Sectors with a fan shape in shallowyellow color represent the visual field of the Kinect 119909 120596 119889ro

119889rg

represent location orientation and distance information of therobot respectively

trajectory is generated by GMR The general trajectorysatisfies all the constraints for the task Eventually a controlleris designed to follow the expected input trajectory and outputthe control commands to the robot Subsequent sections willdescribe each part in detail

41 DataAcquisition Theexperimental data is acquired formaTurtlebot robot Turtlebot is awidely used research platformwhich is compatible with ROS integrating a variety ofsensors such as odometer and extensible vision sensors Wecan develop many kinds of applications by using Turtlebotsuch as mapping and navigation In this paper we willimplement our obstacle avoidance algorithm on the bases ofROS development kit for TurtlebotThe teaching process canbe achieved by three ways (i) using a remote joystick (ii)with the aid of Turtlebot development kit for the navigationalgorithms simulating some trajectories as training samplesand (iii) by means of the following apps implemented onTurtlebot the teacher can guide robot to avoid obstacles andrecord sensor data as samples In this paper we adopt thesecond method which means that we create simulated envi-ronment in software and then make robot execute multipleobstacle avoidance tasks in the simulated environment usingbuilt-in navigation algorithms During this process linearvelocity and angular velocity are obtained by built-in sensorslike odometer Using Kinect we can get the distance betweenthe robot and obstacles as well as distance from the robot tothe target

We assume that the important sensing information comesfrom (i) states of the robot defined as linear velocity andangular velocity (ii) the robotrsquos absolute location (iii) dis-tance between the robot and obstacles noting that the Kinectreturns many pieces of distance information representingthe relative position of the obstacle in a specific range andwe choose the shortest distance as our reference value and(iv) distance between the robot and the target which canbe computed indirectly As illustrated in Figure 3 we use

(119909 120596 119889ro 119889

rg) to represent the sensor informationmentioned

above respectively Here the location information and veloc-ity information are redundantWe think it is necessary to takeboth kinds of information into consideration for the reasonthat location information is very important and direct foran obstacle avoidance task In addition choosing variablesrelating to the task as many as possible is generally verynecessary in the initial stage of a task

In the experiment demonstration is conducted 119899 timesIn each demonstration 119879 data points are recorded so thedataset contains 119873 = 119899 times 119879 points in total Each samplepoint is a collection of variables represented as119863 = 119909 120596 119889in which 119909 stands for the location of robot 120596 stands for thecurrent orientation and 119889 stands for distance Each variablecontains a one-dimensional temporal value and (119889 minus 1)-dimensional spatial values that is 119909 = 119909

119904 119909119905 120596 = 120596

119904 120596119905

119889ro

= 119889ro119904 119889

ro119905 and 119889

rg= 119889

rg119904 119889

rg119905 119909119904isin R119873times2 represent

location 120596 isin R represent orientation 119889ro119904

isin R119873times2 representdistance between robot and obstacles and 119889

rg119904

isin R119873times2

represent distance between robot and goal state Supposingcoordinates of the goal are 119900

119904119894 then we have

119889rg119904

= 119909119904119894119895

minus 119900119904119894 forall119894 isin 1 119899 forall119895 isin 1 119879 (8)

where 119894 indicates the trajectory number and 119895 indicatestime in a trajectory We can rewrite the above equation forvelocities

rg119904

= 119904119894119895

minus 119904119894 (9)

42 Data Preprocessing Because of noise and bad demon-stration quality there exist seriousmismatch and invalid datain the original data As shown in Figure 4 there is no changeat the beginning and end parts of these curves so this partof data is useless To reduce computation cost it is a wisechoice to abandon these data In addition each trajectory hasdifferent lengthwe should resample them to a uniform lengthso that we can handle them easier in the subsequent stepsSo before inputting the data to our algorithms preprocessingsteps including noise filter useless data reduction and resam-pling are conducted Figure 4 displays differences between theoriginal data and the processed data

43 Trajectory Encoding and Retrieval Considering a data-point in training set 120585

119895= 120585119905119895 120585119904119895119873

119895=1isin 119863 120585

119895is a119863-dimensi-

onal vector We encode the data with GMM supposing wehave 119870 Gaussian components The probability items in (4)can be represented as

119901 (119896) = 120587119896

119901 (120585119895| 119896) = N (120585

119895 120583119896 Σ119896)

=1

radic(2120587)119863 1003816100381610038161003816Σ119896

1003816100381610038161003816

119890minus(12)((120585119895minus120583119896)

119879summinus1119896 (120585119895minus120583119896))

(10)

in which 120587119896 120583119896 Σ119896 are model parameters representing

prior-probability mean value and variance respectively

6 Journal of Robotics

0

2

4

6x

(m)

minus1

0

1

120596(r

ads

)

0

2

4

dro

(m)

0

2

4

drg

(m)

100 2000t (s)

100 2000t (s)

100 2000t (s)

100 2000t (s)

(a)

0

2

4

6

x(m

)

50 100 150 2000t (s)

minus1

0

1

120596(r

ads

)

50 100 150 2000t (s)

0

2

4

dro

(m)

50 100 150 2000t (s)

0

2

4

drg

(m)

50 100 150 2000t (s)

(b)

Figure 4 Comparison of original data and processed data (a) shows the original data Lines in red represent cutting position (b) displaysvalid data after processing

To solve these parameters the EM algorithm introducedin the previous section can be used When EM algorithmconverges to a given value we get the right 120587

119896 120583119896 Σ119896 that

is we encode the trajectory data successfullyIn order to get a general barrier-free trajectory which

is given an input which can be temporal signals or otherkinds of sensor data then a barrier-free path is generatedaccording to the given input a natural idea is to divide thevariable 120585 into two parts (120585I

119905 120585

O119905) which represents input and

output respectively and then trajectory retrieval problemcanbe formulized as follows given 120585 sim sum

119870

119894=1120587119894N(120583119894 Σ119894) and 120585

I119905

solve the density probability distribution for 120585O119905 Solution to

this problem is called Gaussian mixture regression Regard-ing 120585

I119905as query points 120585O

119905can be estimated using regression

methods In this paper we use superscriptsI andO standingfor the input and output part for each variable With thisnotation a block decomposition of the data-point 120585

119905 vectors

120583119896 and matrices Σ

119905can be written as

120585119905= [

120585I119896

120585O119896

]

120583119896= [

120583I119896

120583O119896

]

Σ119896= [

ΣI119896

ΣIO119896

ΣOI119896

ΣO119896

]

(11)

Based on (3) for a time step 119905 the mean vector andcovariance matrix for the 119896th component are

O119896(120585

I119905) = 120583

O119896+ Σ

OI119896

ΣIminus1119896

(120585I119905

minus 120583I119896)

ΣO

119896= Σ

O119896minus Σ

OI119896

ΣIminus1119896

ΣIO119896

(12)

Taking 119870 components into consideration the mean vec-tor and covariance matrix for distribution 120585

O119905given 120585

I119905are

119875 (120585O119905| 120585

I119905) sim

119870

sum

119896=1

ℎ119896(120585

I119905)N (

O119896(120585

I119905) Σ

O

119896) (13)

Journal of Robotics 7

where

ℎ119896(120585

I119905) =

120587119894N (120585

I119905

| 120583I119894 Σ

I119894)

sum119870

119896=1120587119896N (120585

I119905

| 120583I119896 Σ

I119896)

(14)

Equation (13) represents a multimodal distributionwhich can be approximated by the following Gaussian dis-tribution

119875 (120585O119905| 120585

I119905) = N (

O119905 Σ

O

119905) with

120583O119905=

119870

sum

119894=1

ℎ119894(120585

I119905) 120583

O119894(120585

I119905)

ΣO

119905=

119870

sum

119894=1

ℎ119894(120585

I119905) (Σ

O

119894+

O119894(120585

I119905)

O119894(120585

I119905)119879

)

minus O119905O119879119905

(15)

Substituting 120585I119905into 120585

O119905using equations described above

a general trajectory is obtained The trajectory not only canavoid obstacles but also can be executed by robots due to itsgood smooth property

44 Finding an Optimum Control Policy Through the previ-ous section we can obtain a general trajectory and this is aplanning problem How to combine planning with controlIn other words how to find a set of control inputs that canmake the robot follow the general trajectory accurately Forthe time-based GMR 120585O

119905is equal to 120585

119904 and 119904 represents

spatial location information Using 119904represents the desired

trajectory and 120585119904represents the actual reached location In

order to evaluate the tracking performance a simple idea isusing the weighted Euclidean distance

119863minus1

sum

119894=1

120596119894(120585119904119894

minus 119904119894)2

= (120585119904minus 119904)119879

119882(120585119904minus 119904) (16)

Naturally the above equation can be applied to mul-tiple variables For this paper we have four variables Let119909

120596

119889ro

119889rg

represent the desired trajectory the actualtracking performance 119875 can be expressed as

119875 = (119909

119904minus 119862119909)119879

119882119909(119909

119904minus 119862119909)

+ (120596

119904minus 119862120596)119879

119882120596(120596

119904minus 119862120596)

+ (119889ro

119904minus 119862119889ro)

119879

119882119889ro(119889ro

119904minus 119862119889ro)

+ (119889rg

119904minus 119862119889rg)

119879

119882119889rg(119889rg

119904minus 119862119889rg)

(17)

where

119862lowast119894119895

= lowast

119904119894119895minus 120585lowast

119904119894119895minus1

forall119894 isin 1 119899 forall119895 isin 1 119879 foralllowast isin 119909 120596 119889ro 119889

rg

(18)

According to performance indicator 119875 finding a desiredcontrol policy is equivalent to maximizing 119875 Since 119875 isa quadratic function an analytical solution can be easilyobtained by gradient method But noting that there existconstraints between variables as shown in (9) maximizing(17) turns into a constrained optimization problem Usuallythis kind of problem can be solved by constructing a Lagrangefunction defined as follows

119871 (119909

119904 120596

119904 119889ro

119904 119889rg

119904 120582) = 119875 + 120582

119879(119909

119904minus 119889rg

119904) (19)

where120582 is Lagrangemultiplier Lettingnabla119871 = 0 solving partialderivatives for

119909

119904 119889rg

119904 we find

2119882119909(119909

119904minus 119862119909) + 120582 = 0

2119882119889rg(119889rg

119904minus 119862119889rg) minus 120582 = 0

(20)

Adding (20) we find

2119882119889rg(119889rg

119904minus 119862119889rg) + 2119882

119909(119909

119904minus 119862119909) = 0 (21)

Substituting 119909

119904= minus119889rg

119904into the above equation we find

119909

119904= (119882

119909minus 119882119889rg)minus1

(119882119909119862119909+ 119882119889rg119862119889rg) (22)

Weight matrices in (22) can be computed by solving theinverse matrix of the current covariance matrix namely

119882 = (ΣO119905)minus1

(23)

5 Simulation Experiments

Experiments are implemented by using a Turtlebot robotIn addition to the robotrsquos own hardware system a Kinectis integrated to obtain distance information The size of themap used in the simulation is 75 times 5m Two different mapsare created to verify the learning effects For the first mapthe obstacle locates in the middle of the map and a smallgap lies in the lower side and the robot can reach the targetplace through the gap The second map is much similar tothe first one except that there exist two obstacles in the mapTo get the destination the robot must avoid every obstacle inthe map Figure 5 shows the shape of the map Figures 5(a)and 5(b) represent the 2D map and Figures 5(c) and 5(d)show a 3D simulation environment which corresponds tothe 2D maps The color maps in maps (a) and (b) stand forcost cloud which provides reference for path planning Thewarm color such as yellow and red stands for a lower costTo the contrary the cool color tune stands for a higher cost Apath planer always tends to choose a trajectory with the leastcostThe cost value is calculated by built-in functions in ROSpackage For the convenience of description we use task 1 andtask 2 to represent the two different obstacle avoidance tasksin maps (a) and (b) It is worth mentioning that we select rel-atively simple environment for both tasks for the reason that

8 Journal of Robotics

StartGoal

x

y

(a)

x

y

(b)

(c) (d)

Figure 5 Twodifferentmaps used in our simulation (a) and (b) are actualmaps generated by software (c) and (d) are simulation environmentcorresponding to maps above them

4 62x (m)

0

1

2

3

4

5

y(m

)

(a)

4 62x (m)

0

1

2

3

4

5

y(m

)

(b)

Figure 6 Demonstrated trajectories in two situations (a) represents sampled data in task 1 (b) represents sampled data in task 2

we putmore emphasis on themechanism behind the obstacleavoidance whichmeans we can learn skills from simple envi-ronment but these skills can be reused in complex situations

Figure 6 shows the demonstrated trajectories in twodifferent situations Coordinates of goal states are set to (6 3)

and (6 2) for task 1 and task 2 respectively The start pointsfor task 1 are randomly generated by sampling from uniformdistributions between 05 and 25 for 119883-coordinates and 2

to 4 for 119884-coordinates The maximum range is 2m whichcan guarantee a good generalization performance for obstacleavoidance For task 2 the start points are generated by sam-pling from uniform distributions from 05 to 2 and 2 to 35

51 Experiments for Trajectory Encoding and Retrieval Fig-ure 7 shows the encoding and retrieval results for task 1 Theencoded variables are location information 119909 and distance

Journal of Robotics 9

GMM

0

1

2

3

4

y(m

)

4 62

x (m)

minus1

0

1

2

3

drg y

(m)

1 2 3 40

drgx (m)

(a)

GMR

0

1

2

3

4

y(m

)

4 62

x (m)

minus1

0

1

2

3

drg y

(m)

2 40

drgx (m)

(b)

Figure 7 Trajectory encoding and retrieval using GMM and GMR for task 1 (a) represents model encoding with GMM Ellipse in dark bluerepresents projections of Gaussian components (b) represents general trajectory generated by GMR Shades in red color represent generationregion

information 119889rg and both variables are 2-dimensional vec-

tors Number of Gaussian components is set to 6 It isworth noting that the optimal number of components 119870

in a model may not be known beforehand so far Acommon method consists of estimating multiple modelswith increasing number of components and selecting anoptimum based on some model selection criterion suchas Cross Validation Bayesian Information Criterion (BIC)or Minimum Description Length (MDL) In our paper wedecide 119870 by using Cross Validation criterion As the figuresshow variance for starting position is large corresponding

to a big retrieval error It indicates that there are no strictrequirements for starting position Variance for end point ispretty small which indicates that the robot can get the goalstate accurately Thus they show that GMM can successfullyextract useful constraints for characterization task GMR canretrieve barrier-free trajectory by using information encodedby GMM as shown in the second column of Figure 5 Figuresin the second row show encoding and retrieval results forvariable 119889

rg As can be seen from the figures 119889rg shares

similar distribution properties with 119909This is because the twovariables have a specific relationship between each other so

10 Journal of Robotics

minus1

0

1120596

(rad

s)

10 200

t (s)

minus1

0

1

120596(r

ads

)

10 200

t (s)

10 200

t (s)

0

5

dro

(m)

10 200

t (s)

0

2

4

dro

(m)

10 200

t (s)

minus5

0

5

drg x

(m)

minus5

0

5

drg x

(m)

10 200

t (s)

minus5

0

5

10 200

t (s)10 200

t (s)

minus5

0

5

drg y

(m)

drg y

(m)

Figure 8 Encoding and retrieval for orientation and distance information in task 1 sharing the same meaning with charts in Figure 7

domean value and covariance which verify the validity of themethod indirectly

Figure 8 shows encoding results for orientation and dis-tance information Figures in the last two rows displayencoding for 119889rg in 119883 and 119884 direction As we see coordinatevalues in119883 directionmonotone decrease to 0This is becausethe robot keeps moving from left to right in the map whilecoordinates in 119884 direction keep going up at the beginningand then drop to zero This is because the robot encountersobstacles in the middle moment in order to avoid obstacles

the robot has to go away from the goal state temporarily Payattention to the final distance value which is close to (0 0)which means the robot reaches the target successfully

Figures 9 and 10 display encoding and retrieval resultsfor task 2 Compared with task 1 task 2 is more complicatedconsidering the law behind the data In order to encode theselaws more Gaussian components are required In our exper-iments 119870 is set to 10 by using the method described in task1 As illustrated in these figures no matter how complex theenvironment is this method still has very satisfactory effect

Journal of Robotics 11

GMM GMR

1

2

3

4

y(m

)

3 4 5 62

x (m)

1

2

3

4

y(m

)

3 4 5 62

x (m)

minus2

minus1

0

1

drg y

(m)

10 3 42

drgx (m)

10 3 42

drgx (m)

minus2

minus1

0

1

drg y

(m)

Figure 9 Trajectory encoding and retrieval using GMM and GMR for variables 119909 and 119889 in task 2

52 Experiments for Finding Control Strategy Experiments inthis part are mainly conducted to verify the derived controlpolicy For a Turtlebot robot the control input is linearvelocity

119909

119904and angular velocity

120579

119904 Assuming we have a

desired trajectory (the trajectory can be given arbitrarily inour experiments in order to find a reference we choose thesecond trajectory in our samples) using theories derived inprevious sections we can compute

119909

119904and

120579

119904 Iterating on

every time we can get a control sequence Comparing thecalculated value with the sensor recorded value we can verifythe validity of the derived results

Figure 12 compares the calculated linear velocity andangular velocity with the desired value Linear velocity isdivided into velocity in 119883 direction and 119884 direction Asthe figure in the second row shows expected velocity in 119884

direction remains close to 0This is because the control inputsfor Turtlebot are a linear velocity in the forward directionthat is 119883 direction and an angular velocity The line graphfor velocity 119884 reveals that there is a minor fluctuation at thebeginning and end stage while the overall trend is consistentwith the fact Velocity 119909 follows the expected value perfectlyConsidering Figure 12(b) it can be seen that the calculatedvalue remains consistent with the actual one It is worth

noting that the length of the calculated sequence differs fromthe actual one since the frequency of the readings fromrobotrsquos sensors is lower than the frequency of commandsIn this case to compare the similarity of the two signalssequence alignment is an indispensable step In this paper weuse dynamic time warping (DTW) to achieve this purposeSignals after alignments show a high correlation whichmeans the calculated values are correct

Figure 11 shows the location information obtained byintegrating velocity signals The calculated control inputsallow the robot to follow the given trajectory accuratelyexcept a minor fluctuation at the beginning time

6 Conclusion

In this paper based on the idea of the imitation learning wepropose a novel idea that the robotrsquos obstacle avoidance skillcan be learned by showing the robot how to avoid an obstacleUnder this consideration task constraints extraction is a keyissue that is how to make robots understand the task Inorder to achieve this purpose the Gaussian mixture model isused to encode the data GMR is used to retrieve satisfactory

12 Journal of Robotics

minus2

0

2120596

(rad

s)

10 20 30 400

t (s)

minus2

0

2

120596(r

ads

)

10 20 30 400

t (s)

0

2

4

dro

(m)

10 20 30 400

t (s)10 20 30 400

t (s)

0

2

4

dro

(m)

minus5

0

5

drg x

(m)

10 20 30 400

t (s)

minus5

0

5

drg x

(m)

10 20 30 400

t (s)

minus5

0

5

10 20 30 400

t (s)10 20 30 400

t (s)

minus5

0

5

drg y

(m)

drg y

(m)

Figure 10 Encoding and retrieval for orientation and distance information in task 2

Expected pathActual path

2 3 4 5 6 71x (m)

05

1

15

2

25

3

35

4

y(m

)

Figure 11 Tracking performance using the derived policy

Journal of Robotics 13

Expected valueCalculated value

50 100 150 2000t (s)

50 100 150 2000t (s)

minus202

Vx

(ms

)

minus101

Vy

(ms

)

(a) Linear velocity

Expected valueCalculated value

50 100 150 2000t (s)

minus202

120596(r

ads

)

(b) Angular velocity

Figure 12 Comparisons between the expected angular velocity andcalculated value

trajectories Eventually to obtain the control strategy a mul-tiobjective optimization problem is constructed The controlstrategy is derived by solving this problem Simulation exper-iments verified our algorithms

For the future improvement work mainly includes thefollowing aspects (i) in order to avoid obstacles onlineonline incremental GMM algorithm should be used (ii) inthis paper a time-based trajectory is generated by GMRthat is use of time signal as input A more reasonable ideais using state information as input so that the robot canchange its decision according to the current environmentfor example using distance information as input and velocityas output thus the control inputs can be directly generated(iii) reinforcement learning algorithms can be integrated tostrengthen robotsrsquo exploration ability so that better general-ization performance is achieved

Competing Interests

The authors declare that the grant scholarship andor fund-ing do not lead to any conflict of interests Additionally theauthors declare that there is no conflict of interests regardingthe publication of this manuscript

Acknowledgments

This work is supported by National Natural Science Founda-tion of China (Grant no 51505470) and the Dr Startup Fundin Liaoning province (20141152)

References

[1] E Masehian and D Sedighizadeh ldquoClassic and heuristicapproaches in robotmotion planningmdasha chronological reviewrdquo

World Academy of Science Engineering and Technology vol 23pp 101ndash106 2007

[2] P Raja and S Pugazhenthi ldquoOptimal path planning of mobilerobots a reviewrdquo International Journal of Physical Sciences vol7 no 9 pp 1314ndash1320 2012

[3] J C Latombe Robot Motion Planning Springer 2012[4] J Borenstein and Y Koren ldquoReal-time obstacle avoidance for

fast mobile robotsrdquo IEEE Transactions on Systems Man andCybernetics vol 19 no 5 pp 1179ndash1187 1989

[5] O Khatib ldquoReal-time obstacle avoidance for manipulators andmobile robotsrdquo International Journal of Robotics Research vol5 no 1 pp 90ndash98 1986

[6] H Mei Y Tian and L Zu ldquoA hybrid ant colony optimizationalgorithm for path planning of robot in dynamic environmentrdquoInternational Journal of Information Technology vol 12 no 3pp 78ndash88 2006

[7] M A P Garcia O Montiel O Castillo R Sepulveda and PMelin ldquoPath planning for autonomous mobile robot navigationwith ant colony optimization and fuzzy cost function evalua-tionrdquo Applied Soft Computing vol 9 no 3 pp 1102ndash1110 2009

[8] C Lopez-Franco J Zepeda N Arana-Daniel and L Lopez-Franco ldquoObstacle avoidance using PSOrdquo in Proceedings of the9th International Conference on Electrical Engineering Comput-ing Science and Automatic Control (CCE rsquo12) pp 1ndash6 MexicoCity Mexico September 2012

[9] H-QMin J-H Zhu andX-J Zheng ldquoObstacle avoidancewithmulti-objective optimization by PSO in dynamic environmentrdquoin Proceedings of the International Conference on MachineLearning and Cybernetics (ICMLC rsquo05) vol 5 pp 2950ndash2956IEEE Guangzhou China August 2005

[10] S Dutta ldquoObstacle avoidance of mobile robot using PSO-basedneuro fuzzy techniquerdquo International Journal of ComputerScience and Engineering vol 2 no 2 pp 301ndash304 2010

[11] W-G Han S-M Baek and T-Y Kuc ldquoGenetic algorithmbased path planning and dynamic obstacle avoidance of mobilerobotsrdquo in Proceedings of the IEEE International Conference onSystems Man and Cybernetics vol 3 pp 2747ndash2751 OrlandoFla USA October 1997

[12] J Tu and S X Yang ldquoGenetic algorithm based path planningfor a mobile robotrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo03) vol 1 pp1221ndash1226 September 2003

[13] Y Zhang and J Wang ldquoObstacle avoidance for kinematicallyredundant manipulators using a dual neural networkrdquo IEEETransactions on Systems Man and Cybernetics Part B Cyber-netics vol 34 no 1 pp 752ndash759 2004

[14] X Wang V Yadav and S N Balakrishnan ldquoCooperativeUAV formation flying with obstaclecollision avoidancerdquo IEEETransactions on Control Systems Technology vol 15 no 4 pp672ndash679 2007

[15] J Godjevac ldquoA learning procedure for a fuzzy system applica-tion to obstacle avoidancerdquo in Proceedings of the InternationalSymposium on Fuzzy Logic pp 142ndash148 Zurich Switzerland1995

[16] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press Boston Mass USA 1998

[17] J Michels A Saxena and A Y Ng ldquoHigh speed obstacleavoidance usingmonocular vision and reinforcement learningrdquoin Proceedings of the 22nd International Conference on MachineLearning (ICML rsquo05) pp 593ndash600 ACMBonnGermany 2005

14 Journal of Robotics

[18] C Ye N H C Yung and D Wang ldquoA fuzzy controller withsupervised learning assisted reinforcement learning algorithmfor obstacle avoidancerdquo IEEETransactions on SystemsMan andCybernetics Part B Cybernetics vol 33 no 1 pp 17ndash27 2003

[19] B-Q Huang G-Y Cao and M Guo ldquoReinforcement learningneural network to the problem of autonomous mobile robotobstacle avoidancerdquo in Proceedings of the International Confer-ence on Machine Learning and Cybernetics (ICMLC rsquo05) vol 1pp 85ndash89 August 2005

[20] C F Touzet ldquoNeural reinforcement learning for behavioursynthesisrdquo Robotics and Autonomous Systems vol 22 no 3-4pp 251ndash281 1997

[21] W D Smart and L P Kaelbling ldquoEffective reinforcement learn-ing for mobile robotsrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo02) vol 4 pp3404ndash3410 Washington DC USA May 2002

[22] K Macek I Petrovic and N Peric ldquoA reinforcement learningapproach to obstacle avoidance ofmobile robotsrdquo inProceedingsof the 7th International Workshop on Advanced Motion Control(AMC rsquo02) pp 462ndash466 IEEE Maribor Slovenia July 2002

[23] U Muller J Ben E Cosatto et al ldquoOff-road obstacle avoidancethrough end-to-end learningrdquo in Advances in Neural Informa-tion Processing Systems pp 739ndash746 2005

[24] B D Argall S Chernova M Veloso and B Browning ldquoAsurvey of robot learning from demonstrationrdquo Robotics andAutonomous Systems vol 57 no 5 pp 469ndash483 2009

[25] C L Nehaniv and K Dautenhahn ldquoLike me-measures ofcorrespondence and imitationrdquo Cybernetics amp Systems vol 32no 1-2 pp 11ndash51 2001

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 3: Research Article Robot Obstacle Avoidance Learning Based ...Research Article Robot Obstacle Avoidance Learning Based on Mixture Models HuiwenZhang, 1,2 XiaoningHan, 1,2 MingliangFu,

Journal of Robotics 3

Demonstration Policy derivation Robot systemDataset

Learning form demonstration Task execution

(S A T)

120587

Figure 1 Framework of learning form demonstration (LfD)

be described as finding the more general barrier-free path(policy) from the demonstrated barrier-free path (sample119863)Themain process for LfD framework is illustrated in Figure 1

22 Main Research Topics in LfD Nehaniv and Dautenhahn[25] summarized themain problems in LfD as four issues (1)What to imitate (2) How to imitate (3) When to imitate(4) Whom to imitate So far only the former two problemsare solved For the problem of robot obstacle avoidancewhat to imitate means the robot needs to extract the rightconstraints which can characterize the task and problem ofhow to imitate refers to generating a trajectory that satisfiesthe constraints obtained in the previous step

In order to solve the above problems (mainly the formertwo problems) there are three main ideas

(i) Directly get a map function 120587 = 119891(119911) from trainingset 119863 = (119911

119894 119886119894) where 119911 stands for the observed

state This idea treats the problem as supervisedlearning problem

(ii) Learn a system model namely the return functionand state transition probability distribution formtraining set and then derive a policy using MDPmethods

(iii) Learn the pre- or postcondition of each action andthen obtain control sequence using programmingmethods

For the obstacle avoidance problem through learning thebarrier-free path from teachers to produce a generalized pathcan be seen as a supervised regression problem GMM is aprobabilistic model which is suitable to model the naturalvariations in human and robot motionsTheoretically GMMcan model any probabilistic density functions (PDFs) byincreasing the number of Gaussian components so GMMhas strong encoding capacity Finally movements recogniz-ing predicting and generating can be integrated togetherwithin the same GMM and GMR encoding strategy So itis very favorable for us to use GMM encoding our priordemonstration information and use GMR as the solution tothe regression problem mentioned above Details of GMMtheory are described below

3 Gaussian Mixture Model

For a probability density estimation problem if in additionto the known observation variables there may exist some

unobservable latent variables but we know the distributionof the latent variable and the conditional distribution of theobservable variable given the latent variable then we can usemixture models to estimate the parameters of the distribu-tion so as to get the approximate probability distribution ofthe known data Before introducing GMM we deviate to talkabout Gaussian distribution

31 Marginal and Conditional Distribution of a MultivariateGaussian Distribution Assume that we have a random vari-able

119909 = [1199091

1199092

] (1)

in which 1199091isin R119903 119909

2isin R119904 119909 isin R119903+119904 119909 sim N(120583 Σ) where

120583 = [1205831

1205832

]

Σ = [Σ11

Σ12

Σ21

Σ22

]

(2)

And 1205831isin R119903 120583

2isin R119904 120583 isin R119903+119904 Σ

11isin R119903times119903 and Σ

12isin

R119903times119904 Noting that the covariance matrix is symmetric so wehave Σ

12= Σ21

119879Because the marginal distribution of a Gaussian is also

a Gaussian distribution it is easy to get that 119864[1199091] = 120583

1

cov(1199091) = Σ

11 119864[1199092] = 1205832 and cov(119909

2) = Σ

22 and we have

1199091sim N(120583

1 Σ11) and 119909

2sim N(120583

2 Σ22) As a step forward we

can prove 1199091| 1199092sim N(120583

1|2 Σ1|2

) where

1205831|2

= 1205831+ Σ12Σminus1

22(1199092minus 1205832)

Σ1|2

= Σ11

minus Σ12Σminus1

22Σ21

(3)

32 Formulation of GMM Assuming that we have a trainingset 119863 = 119909

(1) 119909(2)

119909(119898)

containing 119898 samples everysample 119909(119894) is a vectorWe are trying to find a density functionto describe the data with the least error which is equal tomaximizing the probability of the data In order to representthis probability we need to make some assumptions Firstwe assume that the latent variable 119911

(119894)sim Multinomial(120601) in

which 120601119895ge 0 sum

119896

119895=1120601119895= 1 besides we suppose 119909

(119894)| 119911(119894)

=

119895 sim N(120583119895 Σ119895) So we can model the data in our hands by

4 Journal of Robotics

Data acquisitiondemonstration and recording

Preprocessingdenoising invalid datadetection sequencealignment

Encoding and retrievalGMM GMR

Control policymetric of imitationoptimization

Robot

Training data What to imitate How to imitate

Odom data

Vision data

x 120596

d

120585j 120583k Σk

120585xs linear velocity

120585120579s angular velocity

Figure 2 System architecture of obstacle avoidance learning

adjusting parameters (120601 120583 Σ)The likelihood function can bewritten as

119897 (120601 120583 Σ) =

119898

sum

119894=1

log119901 (119909(119894) 120601 120583 Σ)

=

119898

sum

119894=1

log119896

sum

119911(119894)=1

119901 (119909(119894)

| 119911(119894) 120583 Σ) 119901 (119911

(119894) 120601)

(4)

Maximize (4) and then we can find the right modelparameters To realize this purpose typically we find thepartial derivatives for each of the parameters and set themequal to zero but actually we find that we cannot get a closed-loop solution for this problem using the above method Themain reason is that 119911 is unknown if 119911 is known the modeldegrades to a Gaussian discriminant model following thepreviously mentioned steps we can obtain a closed-loopsolutionWewill introduce a kind of commonly used iterativealgorithm to solve the problem in (4)

33 EM for GMM Expectation Maximization (EM) is aniterative algorithm mainly including two steps which arecalled E-step andM-stepThemain idea is to find a tight lowerbound function of the objective function and then maximizethe lower bound function so that the objective functioncan be maximized indirectly The reason for introducing alower bound function is that finding the extreme value ofthe lower bound function is much easier compared to theoriginal function For GMM the E-step tries to guess thevalue of 119911(119894) equivalent to finding out which components thecurrent sample belongs to and then the model parametersare updated based on our conjecture The algorithm can besummarized as follows

E-step for each 119894 and 119895 compute the poster-probability of119911(119894)

119908119894

119895fl 119901 (119911

(119894)= 119895 | 119909

(119894) 120601 120583 Σ) (5)

M-step modify our model parameters

120601119895fl

1

119898

119898

sum

119894=1

119908(119894)

119895

120583119895fl

sum119898

119894=1119908(119894)

119895119909(119894)

sum119898

119894=1119908(119894)

119895

Σ119895fl

sum119898

119894=1119908(119894)

119895(119909(119894)

minus 120583119895) (119909(119894)

minus 120583119895)119879

sum119898

119894=1119908(119894)

119895

(6)

Note that the value of 119908119894

119895can be computed using the

following equation

119908119894

119895fl 119901 (119911

(119894)= 119895 | 119909

(119894) 120601 120583 Σ)

=

119901 (119909(119894)

| 119911(119894)

= 119895 120583 Σ) 119901 (119911(119894)

= 119895 120601)

sum119896

119897=1119901 (119909(119894) | 119911(119894) = 119897 120583 Σ) 119901 (119911(119894) = 119897 120601)

(7)

iterating between (6) and (7) until convergence whichmeansthat the changes ofmodel parameters are less than a threshold120598 or the value of likelihood function changes in a very narrowrange

4 The Architecture of ObstacleAvoidance Learning

Figure 2 shows the architecture of obstacle avoidance learningand the information flow between different modules Thewhole system contains three modules data acquisition andpreprocessing constraints extraction and trajectory retrievaland optimal control strategy derivation Mileage informationis collected by built-in sensors such as odometers anddistance information is acquired by Kinect Through noisefiltering invalid data reduction and sequence alignment forthe original data we get the input for the learning machineThen the input signal is encoded by GMM and a general

Journal of Robotics 5

Start

GoalObstacle

x120579

120596

dro

drg

Figure 3 Useful data collected by sensors Line in blue showsthe demonstrated trajectory Sectors with a fan shape in shallowyellow color represent the visual field of the Kinect 119909 120596 119889ro

119889rg

represent location orientation and distance information of therobot respectively

trajectory is generated by GMR The general trajectorysatisfies all the constraints for the task Eventually a controlleris designed to follow the expected input trajectory and outputthe control commands to the robot Subsequent sections willdescribe each part in detail

41 DataAcquisition Theexperimental data is acquired formaTurtlebot robot Turtlebot is awidely used research platformwhich is compatible with ROS integrating a variety ofsensors such as odometer and extensible vision sensors Wecan develop many kinds of applications by using Turtlebotsuch as mapping and navigation In this paper we willimplement our obstacle avoidance algorithm on the bases ofROS development kit for TurtlebotThe teaching process canbe achieved by three ways (i) using a remote joystick (ii)with the aid of Turtlebot development kit for the navigationalgorithms simulating some trajectories as training samplesand (iii) by means of the following apps implemented onTurtlebot the teacher can guide robot to avoid obstacles andrecord sensor data as samples In this paper we adopt thesecond method which means that we create simulated envi-ronment in software and then make robot execute multipleobstacle avoidance tasks in the simulated environment usingbuilt-in navigation algorithms During this process linearvelocity and angular velocity are obtained by built-in sensorslike odometer Using Kinect we can get the distance betweenthe robot and obstacles as well as distance from the robot tothe target

We assume that the important sensing information comesfrom (i) states of the robot defined as linear velocity andangular velocity (ii) the robotrsquos absolute location (iii) dis-tance between the robot and obstacles noting that the Kinectreturns many pieces of distance information representingthe relative position of the obstacle in a specific range andwe choose the shortest distance as our reference value and(iv) distance between the robot and the target which canbe computed indirectly As illustrated in Figure 3 we use

(119909 120596 119889ro 119889

rg) to represent the sensor informationmentioned

above respectively Here the location information and veloc-ity information are redundantWe think it is necessary to takeboth kinds of information into consideration for the reasonthat location information is very important and direct foran obstacle avoidance task In addition choosing variablesrelating to the task as many as possible is generally verynecessary in the initial stage of a task

In the experiment demonstration is conducted 119899 timesIn each demonstration 119879 data points are recorded so thedataset contains 119873 = 119899 times 119879 points in total Each samplepoint is a collection of variables represented as119863 = 119909 120596 119889in which 119909 stands for the location of robot 120596 stands for thecurrent orientation and 119889 stands for distance Each variablecontains a one-dimensional temporal value and (119889 minus 1)-dimensional spatial values that is 119909 = 119909

119904 119909119905 120596 = 120596

119904 120596119905

119889ro

= 119889ro119904 119889

ro119905 and 119889

rg= 119889

rg119904 119889

rg119905 119909119904isin R119873times2 represent

location 120596 isin R represent orientation 119889ro119904

isin R119873times2 representdistance between robot and obstacles and 119889

rg119904

isin R119873times2

represent distance between robot and goal state Supposingcoordinates of the goal are 119900

119904119894 then we have

119889rg119904

= 119909119904119894119895

minus 119900119904119894 forall119894 isin 1 119899 forall119895 isin 1 119879 (8)

where 119894 indicates the trajectory number and 119895 indicatestime in a trajectory We can rewrite the above equation forvelocities

rg119904

= 119904119894119895

minus 119904119894 (9)

42 Data Preprocessing Because of noise and bad demon-stration quality there exist seriousmismatch and invalid datain the original data As shown in Figure 4 there is no changeat the beginning and end parts of these curves so this partof data is useless To reduce computation cost it is a wisechoice to abandon these data In addition each trajectory hasdifferent lengthwe should resample them to a uniform lengthso that we can handle them easier in the subsequent stepsSo before inputting the data to our algorithms preprocessingsteps including noise filter useless data reduction and resam-pling are conducted Figure 4 displays differences between theoriginal data and the processed data

43 Trajectory Encoding and Retrieval Considering a data-point in training set 120585

119895= 120585119905119895 120585119904119895119873

119895=1isin 119863 120585

119895is a119863-dimensi-

onal vector We encode the data with GMM supposing wehave 119870 Gaussian components The probability items in (4)can be represented as

119901 (119896) = 120587119896

119901 (120585119895| 119896) = N (120585

119895 120583119896 Σ119896)

=1

radic(2120587)119863 1003816100381610038161003816Σ119896

1003816100381610038161003816

119890minus(12)((120585119895minus120583119896)

119879summinus1119896 (120585119895minus120583119896))

(10)

in which 120587119896 120583119896 Σ119896 are model parameters representing

prior-probability mean value and variance respectively

6 Journal of Robotics

0

2

4

6x

(m)

minus1

0

1

120596(r

ads

)

0

2

4

dro

(m)

0

2

4

drg

(m)

100 2000t (s)

100 2000t (s)

100 2000t (s)

100 2000t (s)

(a)

0

2

4

6

x(m

)

50 100 150 2000t (s)

minus1

0

1

120596(r

ads

)

50 100 150 2000t (s)

0

2

4

dro

(m)

50 100 150 2000t (s)

0

2

4

drg

(m)

50 100 150 2000t (s)

(b)

Figure 4 Comparison of original data and processed data (a) shows the original data Lines in red represent cutting position (b) displaysvalid data after processing

To solve these parameters the EM algorithm introducedin the previous section can be used When EM algorithmconverges to a given value we get the right 120587

119896 120583119896 Σ119896 that

is we encode the trajectory data successfullyIn order to get a general barrier-free trajectory which

is given an input which can be temporal signals or otherkinds of sensor data then a barrier-free path is generatedaccording to the given input a natural idea is to divide thevariable 120585 into two parts (120585I

119905 120585

O119905) which represents input and

output respectively and then trajectory retrieval problemcanbe formulized as follows given 120585 sim sum

119870

119894=1120587119894N(120583119894 Σ119894) and 120585

I119905

solve the density probability distribution for 120585O119905 Solution to

this problem is called Gaussian mixture regression Regard-ing 120585

I119905as query points 120585O

119905can be estimated using regression

methods In this paper we use superscriptsI andO standingfor the input and output part for each variable With thisnotation a block decomposition of the data-point 120585

119905 vectors

120583119896 and matrices Σ

119905can be written as

120585119905= [

120585I119896

120585O119896

]

120583119896= [

120583I119896

120583O119896

]

Σ119896= [

ΣI119896

ΣIO119896

ΣOI119896

ΣO119896

]

(11)

Based on (3) for a time step 119905 the mean vector andcovariance matrix for the 119896th component are

O119896(120585

I119905) = 120583

O119896+ Σ

OI119896

ΣIminus1119896

(120585I119905

minus 120583I119896)

ΣO

119896= Σ

O119896minus Σ

OI119896

ΣIminus1119896

ΣIO119896

(12)

Taking 119870 components into consideration the mean vec-tor and covariance matrix for distribution 120585

O119905given 120585

I119905are

119875 (120585O119905| 120585

I119905) sim

119870

sum

119896=1

ℎ119896(120585

I119905)N (

O119896(120585

I119905) Σ

O

119896) (13)

Journal of Robotics 7

where

ℎ119896(120585

I119905) =

120587119894N (120585

I119905

| 120583I119894 Σ

I119894)

sum119870

119896=1120587119896N (120585

I119905

| 120583I119896 Σ

I119896)

(14)

Equation (13) represents a multimodal distributionwhich can be approximated by the following Gaussian dis-tribution

119875 (120585O119905| 120585

I119905) = N (

O119905 Σ

O

119905) with

120583O119905=

119870

sum

119894=1

ℎ119894(120585

I119905) 120583

O119894(120585

I119905)

ΣO

119905=

119870

sum

119894=1

ℎ119894(120585

I119905) (Σ

O

119894+

O119894(120585

I119905)

O119894(120585

I119905)119879

)

minus O119905O119879119905

(15)

Substituting 120585I119905into 120585

O119905using equations described above

a general trajectory is obtained The trajectory not only canavoid obstacles but also can be executed by robots due to itsgood smooth property

44 Finding an Optimum Control Policy Through the previ-ous section we can obtain a general trajectory and this is aplanning problem How to combine planning with controlIn other words how to find a set of control inputs that canmake the robot follow the general trajectory accurately Forthe time-based GMR 120585O

119905is equal to 120585

119904 and 119904 represents

spatial location information Using 119904represents the desired

trajectory and 120585119904represents the actual reached location In

order to evaluate the tracking performance a simple idea isusing the weighted Euclidean distance

119863minus1

sum

119894=1

120596119894(120585119904119894

minus 119904119894)2

= (120585119904minus 119904)119879

119882(120585119904minus 119904) (16)

Naturally the above equation can be applied to mul-tiple variables For this paper we have four variables Let119909

120596

119889ro

119889rg

represent the desired trajectory the actualtracking performance 119875 can be expressed as

119875 = (119909

119904minus 119862119909)119879

119882119909(119909

119904minus 119862119909)

+ (120596

119904minus 119862120596)119879

119882120596(120596

119904minus 119862120596)

+ (119889ro

119904minus 119862119889ro)

119879

119882119889ro(119889ro

119904minus 119862119889ro)

+ (119889rg

119904minus 119862119889rg)

119879

119882119889rg(119889rg

119904minus 119862119889rg)

(17)

where

119862lowast119894119895

= lowast

119904119894119895minus 120585lowast

119904119894119895minus1

forall119894 isin 1 119899 forall119895 isin 1 119879 foralllowast isin 119909 120596 119889ro 119889

rg

(18)

According to performance indicator 119875 finding a desiredcontrol policy is equivalent to maximizing 119875 Since 119875 isa quadratic function an analytical solution can be easilyobtained by gradient method But noting that there existconstraints between variables as shown in (9) maximizing(17) turns into a constrained optimization problem Usuallythis kind of problem can be solved by constructing a Lagrangefunction defined as follows

119871 (119909

119904 120596

119904 119889ro

119904 119889rg

119904 120582) = 119875 + 120582

119879(119909

119904minus 119889rg

119904) (19)

where120582 is Lagrangemultiplier Lettingnabla119871 = 0 solving partialderivatives for

119909

119904 119889rg

119904 we find

2119882119909(119909

119904minus 119862119909) + 120582 = 0

2119882119889rg(119889rg

119904minus 119862119889rg) minus 120582 = 0

(20)

Adding (20) we find

2119882119889rg(119889rg

119904minus 119862119889rg) + 2119882

119909(119909

119904minus 119862119909) = 0 (21)

Substituting 119909

119904= minus119889rg

119904into the above equation we find

119909

119904= (119882

119909minus 119882119889rg)minus1

(119882119909119862119909+ 119882119889rg119862119889rg) (22)

Weight matrices in (22) can be computed by solving theinverse matrix of the current covariance matrix namely

119882 = (ΣO119905)minus1

(23)

5 Simulation Experiments

Experiments are implemented by using a Turtlebot robotIn addition to the robotrsquos own hardware system a Kinectis integrated to obtain distance information The size of themap used in the simulation is 75 times 5m Two different mapsare created to verify the learning effects For the first mapthe obstacle locates in the middle of the map and a smallgap lies in the lower side and the robot can reach the targetplace through the gap The second map is much similar tothe first one except that there exist two obstacles in the mapTo get the destination the robot must avoid every obstacle inthe map Figure 5 shows the shape of the map Figures 5(a)and 5(b) represent the 2D map and Figures 5(c) and 5(d)show a 3D simulation environment which corresponds tothe 2D maps The color maps in maps (a) and (b) stand forcost cloud which provides reference for path planning Thewarm color such as yellow and red stands for a lower costTo the contrary the cool color tune stands for a higher cost Apath planer always tends to choose a trajectory with the leastcostThe cost value is calculated by built-in functions in ROSpackage For the convenience of description we use task 1 andtask 2 to represent the two different obstacle avoidance tasksin maps (a) and (b) It is worth mentioning that we select rel-atively simple environment for both tasks for the reason that

8 Journal of Robotics

StartGoal

x

y

(a)

x

y

(b)

(c) (d)

Figure 5 Twodifferentmaps used in our simulation (a) and (b) are actualmaps generated by software (c) and (d) are simulation environmentcorresponding to maps above them

4 62x (m)

0

1

2

3

4

5

y(m

)

(a)

4 62x (m)

0

1

2

3

4

5

y(m

)

(b)

Figure 6 Demonstrated trajectories in two situations (a) represents sampled data in task 1 (b) represents sampled data in task 2

we putmore emphasis on themechanism behind the obstacleavoidance whichmeans we can learn skills from simple envi-ronment but these skills can be reused in complex situations

Figure 6 shows the demonstrated trajectories in twodifferent situations Coordinates of goal states are set to (6 3)

and (6 2) for task 1 and task 2 respectively The start pointsfor task 1 are randomly generated by sampling from uniformdistributions between 05 and 25 for 119883-coordinates and 2

to 4 for 119884-coordinates The maximum range is 2m whichcan guarantee a good generalization performance for obstacleavoidance For task 2 the start points are generated by sam-pling from uniform distributions from 05 to 2 and 2 to 35

51 Experiments for Trajectory Encoding and Retrieval Fig-ure 7 shows the encoding and retrieval results for task 1 Theencoded variables are location information 119909 and distance

Journal of Robotics 9

GMM

0

1

2

3

4

y(m

)

4 62

x (m)

minus1

0

1

2

3

drg y

(m)

1 2 3 40

drgx (m)

(a)

GMR

0

1

2

3

4

y(m

)

4 62

x (m)

minus1

0

1

2

3

drg y

(m)

2 40

drgx (m)

(b)

Figure 7 Trajectory encoding and retrieval using GMM and GMR for task 1 (a) represents model encoding with GMM Ellipse in dark bluerepresents projections of Gaussian components (b) represents general trajectory generated by GMR Shades in red color represent generationregion

information 119889rg and both variables are 2-dimensional vec-

tors Number of Gaussian components is set to 6 It isworth noting that the optimal number of components 119870

in a model may not be known beforehand so far Acommon method consists of estimating multiple modelswith increasing number of components and selecting anoptimum based on some model selection criterion suchas Cross Validation Bayesian Information Criterion (BIC)or Minimum Description Length (MDL) In our paper wedecide 119870 by using Cross Validation criterion As the figuresshow variance for starting position is large corresponding

to a big retrieval error It indicates that there are no strictrequirements for starting position Variance for end point ispretty small which indicates that the robot can get the goalstate accurately Thus they show that GMM can successfullyextract useful constraints for characterization task GMR canretrieve barrier-free trajectory by using information encodedby GMM as shown in the second column of Figure 5 Figuresin the second row show encoding and retrieval results forvariable 119889

rg As can be seen from the figures 119889rg shares

similar distribution properties with 119909This is because the twovariables have a specific relationship between each other so

10 Journal of Robotics

minus1

0

1120596

(rad

s)

10 200

t (s)

minus1

0

1

120596(r

ads

)

10 200

t (s)

10 200

t (s)

0

5

dro

(m)

10 200

t (s)

0

2

4

dro

(m)

10 200

t (s)

minus5

0

5

drg x

(m)

minus5

0

5

drg x

(m)

10 200

t (s)

minus5

0

5

10 200

t (s)10 200

t (s)

minus5

0

5

drg y

(m)

drg y

(m)

Figure 8 Encoding and retrieval for orientation and distance information in task 1 sharing the same meaning with charts in Figure 7

domean value and covariance which verify the validity of themethod indirectly

Figure 8 shows encoding results for orientation and dis-tance information Figures in the last two rows displayencoding for 119889rg in 119883 and 119884 direction As we see coordinatevalues in119883 directionmonotone decrease to 0This is becausethe robot keeps moving from left to right in the map whilecoordinates in 119884 direction keep going up at the beginningand then drop to zero This is because the robot encountersobstacles in the middle moment in order to avoid obstacles

the robot has to go away from the goal state temporarily Payattention to the final distance value which is close to (0 0)which means the robot reaches the target successfully

Figures 9 and 10 display encoding and retrieval resultsfor task 2 Compared with task 1 task 2 is more complicatedconsidering the law behind the data In order to encode theselaws more Gaussian components are required In our exper-iments 119870 is set to 10 by using the method described in task1 As illustrated in these figures no matter how complex theenvironment is this method still has very satisfactory effect

Journal of Robotics 11

GMM GMR

1

2

3

4

y(m

)

3 4 5 62

x (m)

1

2

3

4

y(m

)

3 4 5 62

x (m)

minus2

minus1

0

1

drg y

(m)

10 3 42

drgx (m)

10 3 42

drgx (m)

minus2

minus1

0

1

drg y

(m)

Figure 9 Trajectory encoding and retrieval using GMM and GMR for variables 119909 and 119889 in task 2

52 Experiments for Finding Control Strategy Experiments inthis part are mainly conducted to verify the derived controlpolicy For a Turtlebot robot the control input is linearvelocity

119909

119904and angular velocity

120579

119904 Assuming we have a

desired trajectory (the trajectory can be given arbitrarily inour experiments in order to find a reference we choose thesecond trajectory in our samples) using theories derived inprevious sections we can compute

119909

119904and

120579

119904 Iterating on

every time we can get a control sequence Comparing thecalculated value with the sensor recorded value we can verifythe validity of the derived results

Figure 12 compares the calculated linear velocity andangular velocity with the desired value Linear velocity isdivided into velocity in 119883 direction and 119884 direction Asthe figure in the second row shows expected velocity in 119884

direction remains close to 0This is because the control inputsfor Turtlebot are a linear velocity in the forward directionthat is 119883 direction and an angular velocity The line graphfor velocity 119884 reveals that there is a minor fluctuation at thebeginning and end stage while the overall trend is consistentwith the fact Velocity 119909 follows the expected value perfectlyConsidering Figure 12(b) it can be seen that the calculatedvalue remains consistent with the actual one It is worth

noting that the length of the calculated sequence differs fromthe actual one since the frequency of the readings fromrobotrsquos sensors is lower than the frequency of commandsIn this case to compare the similarity of the two signalssequence alignment is an indispensable step In this paper weuse dynamic time warping (DTW) to achieve this purposeSignals after alignments show a high correlation whichmeans the calculated values are correct

Figure 11 shows the location information obtained byintegrating velocity signals The calculated control inputsallow the robot to follow the given trajectory accuratelyexcept a minor fluctuation at the beginning time

6 Conclusion

In this paper based on the idea of the imitation learning wepropose a novel idea that the robotrsquos obstacle avoidance skillcan be learned by showing the robot how to avoid an obstacleUnder this consideration task constraints extraction is a keyissue that is how to make robots understand the task Inorder to achieve this purpose the Gaussian mixture model isused to encode the data GMR is used to retrieve satisfactory

12 Journal of Robotics

minus2

0

2120596

(rad

s)

10 20 30 400

t (s)

minus2

0

2

120596(r

ads

)

10 20 30 400

t (s)

0

2

4

dro

(m)

10 20 30 400

t (s)10 20 30 400

t (s)

0

2

4

dro

(m)

minus5

0

5

drg x

(m)

10 20 30 400

t (s)

minus5

0

5

drg x

(m)

10 20 30 400

t (s)

minus5

0

5

10 20 30 400

t (s)10 20 30 400

t (s)

minus5

0

5

drg y

(m)

drg y

(m)

Figure 10 Encoding and retrieval for orientation and distance information in task 2

Expected pathActual path

2 3 4 5 6 71x (m)

05

1

15

2

25

3

35

4

y(m

)

Figure 11 Tracking performance using the derived policy

Journal of Robotics 13

Expected valueCalculated value

50 100 150 2000t (s)

50 100 150 2000t (s)

minus202

Vx

(ms

)

minus101

Vy

(ms

)

(a) Linear velocity

Expected valueCalculated value

50 100 150 2000t (s)

minus202

120596(r

ads

)

(b) Angular velocity

Figure 12 Comparisons between the expected angular velocity andcalculated value

trajectories Eventually to obtain the control strategy a mul-tiobjective optimization problem is constructed The controlstrategy is derived by solving this problem Simulation exper-iments verified our algorithms

For the future improvement work mainly includes thefollowing aspects (i) in order to avoid obstacles onlineonline incremental GMM algorithm should be used (ii) inthis paper a time-based trajectory is generated by GMRthat is use of time signal as input A more reasonable ideais using state information as input so that the robot canchange its decision according to the current environmentfor example using distance information as input and velocityas output thus the control inputs can be directly generated(iii) reinforcement learning algorithms can be integrated tostrengthen robotsrsquo exploration ability so that better general-ization performance is achieved

Competing Interests

The authors declare that the grant scholarship andor fund-ing do not lead to any conflict of interests Additionally theauthors declare that there is no conflict of interests regardingthe publication of this manuscript

Acknowledgments

This work is supported by National Natural Science Founda-tion of China (Grant no 51505470) and the Dr Startup Fundin Liaoning province (20141152)

References

[1] E Masehian and D Sedighizadeh ldquoClassic and heuristicapproaches in robotmotion planningmdasha chronological reviewrdquo

World Academy of Science Engineering and Technology vol 23pp 101ndash106 2007

[2] P Raja and S Pugazhenthi ldquoOptimal path planning of mobilerobots a reviewrdquo International Journal of Physical Sciences vol7 no 9 pp 1314ndash1320 2012

[3] J C Latombe Robot Motion Planning Springer 2012[4] J Borenstein and Y Koren ldquoReal-time obstacle avoidance for

fast mobile robotsrdquo IEEE Transactions on Systems Man andCybernetics vol 19 no 5 pp 1179ndash1187 1989

[5] O Khatib ldquoReal-time obstacle avoidance for manipulators andmobile robotsrdquo International Journal of Robotics Research vol5 no 1 pp 90ndash98 1986

[6] H Mei Y Tian and L Zu ldquoA hybrid ant colony optimizationalgorithm for path planning of robot in dynamic environmentrdquoInternational Journal of Information Technology vol 12 no 3pp 78ndash88 2006

[7] M A P Garcia O Montiel O Castillo R Sepulveda and PMelin ldquoPath planning for autonomous mobile robot navigationwith ant colony optimization and fuzzy cost function evalua-tionrdquo Applied Soft Computing vol 9 no 3 pp 1102ndash1110 2009

[8] C Lopez-Franco J Zepeda N Arana-Daniel and L Lopez-Franco ldquoObstacle avoidance using PSOrdquo in Proceedings of the9th International Conference on Electrical Engineering Comput-ing Science and Automatic Control (CCE rsquo12) pp 1ndash6 MexicoCity Mexico September 2012

[9] H-QMin J-H Zhu andX-J Zheng ldquoObstacle avoidancewithmulti-objective optimization by PSO in dynamic environmentrdquoin Proceedings of the International Conference on MachineLearning and Cybernetics (ICMLC rsquo05) vol 5 pp 2950ndash2956IEEE Guangzhou China August 2005

[10] S Dutta ldquoObstacle avoidance of mobile robot using PSO-basedneuro fuzzy techniquerdquo International Journal of ComputerScience and Engineering vol 2 no 2 pp 301ndash304 2010

[11] W-G Han S-M Baek and T-Y Kuc ldquoGenetic algorithmbased path planning and dynamic obstacle avoidance of mobilerobotsrdquo in Proceedings of the IEEE International Conference onSystems Man and Cybernetics vol 3 pp 2747ndash2751 OrlandoFla USA October 1997

[12] J Tu and S X Yang ldquoGenetic algorithm based path planningfor a mobile robotrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo03) vol 1 pp1221ndash1226 September 2003

[13] Y Zhang and J Wang ldquoObstacle avoidance for kinematicallyredundant manipulators using a dual neural networkrdquo IEEETransactions on Systems Man and Cybernetics Part B Cyber-netics vol 34 no 1 pp 752ndash759 2004

[14] X Wang V Yadav and S N Balakrishnan ldquoCooperativeUAV formation flying with obstaclecollision avoidancerdquo IEEETransactions on Control Systems Technology vol 15 no 4 pp672ndash679 2007

[15] J Godjevac ldquoA learning procedure for a fuzzy system applica-tion to obstacle avoidancerdquo in Proceedings of the InternationalSymposium on Fuzzy Logic pp 142ndash148 Zurich Switzerland1995

[16] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press Boston Mass USA 1998

[17] J Michels A Saxena and A Y Ng ldquoHigh speed obstacleavoidance usingmonocular vision and reinforcement learningrdquoin Proceedings of the 22nd International Conference on MachineLearning (ICML rsquo05) pp 593ndash600 ACMBonnGermany 2005

14 Journal of Robotics

[18] C Ye N H C Yung and D Wang ldquoA fuzzy controller withsupervised learning assisted reinforcement learning algorithmfor obstacle avoidancerdquo IEEETransactions on SystemsMan andCybernetics Part B Cybernetics vol 33 no 1 pp 17ndash27 2003

[19] B-Q Huang G-Y Cao and M Guo ldquoReinforcement learningneural network to the problem of autonomous mobile robotobstacle avoidancerdquo in Proceedings of the International Confer-ence on Machine Learning and Cybernetics (ICMLC rsquo05) vol 1pp 85ndash89 August 2005

[20] C F Touzet ldquoNeural reinforcement learning for behavioursynthesisrdquo Robotics and Autonomous Systems vol 22 no 3-4pp 251ndash281 1997

[21] W D Smart and L P Kaelbling ldquoEffective reinforcement learn-ing for mobile robotsrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo02) vol 4 pp3404ndash3410 Washington DC USA May 2002

[22] K Macek I Petrovic and N Peric ldquoA reinforcement learningapproach to obstacle avoidance ofmobile robotsrdquo inProceedingsof the 7th International Workshop on Advanced Motion Control(AMC rsquo02) pp 462ndash466 IEEE Maribor Slovenia July 2002

[23] U Muller J Ben E Cosatto et al ldquoOff-road obstacle avoidancethrough end-to-end learningrdquo in Advances in Neural Informa-tion Processing Systems pp 739ndash746 2005

[24] B D Argall S Chernova M Veloso and B Browning ldquoAsurvey of robot learning from demonstrationrdquo Robotics andAutonomous Systems vol 57 no 5 pp 469ndash483 2009

[25] C L Nehaniv and K Dautenhahn ldquoLike me-measures ofcorrespondence and imitationrdquo Cybernetics amp Systems vol 32no 1-2 pp 11ndash51 2001

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 4: Research Article Robot Obstacle Avoidance Learning Based ...Research Article Robot Obstacle Avoidance Learning Based on Mixture Models HuiwenZhang, 1,2 XiaoningHan, 1,2 MingliangFu,

4 Journal of Robotics

Data acquisitiondemonstration and recording

Preprocessingdenoising invalid datadetection sequencealignment

Encoding and retrievalGMM GMR

Control policymetric of imitationoptimization

Robot

Training data What to imitate How to imitate

Odom data

Vision data

x 120596

d

120585j 120583k Σk

120585xs linear velocity

120585120579s angular velocity

Figure 2 System architecture of obstacle avoidance learning

adjusting parameters (120601 120583 Σ)The likelihood function can bewritten as

119897 (120601 120583 Σ) =

119898

sum

119894=1

log119901 (119909(119894) 120601 120583 Σ)

=

119898

sum

119894=1

log119896

sum

119911(119894)=1

119901 (119909(119894)

| 119911(119894) 120583 Σ) 119901 (119911

(119894) 120601)

(4)

Maximize (4) and then we can find the right modelparameters To realize this purpose typically we find thepartial derivatives for each of the parameters and set themequal to zero but actually we find that we cannot get a closed-loop solution for this problem using the above method Themain reason is that 119911 is unknown if 119911 is known the modeldegrades to a Gaussian discriminant model following thepreviously mentioned steps we can obtain a closed-loopsolutionWewill introduce a kind of commonly used iterativealgorithm to solve the problem in (4)

33 EM for GMM Expectation Maximization (EM) is aniterative algorithm mainly including two steps which arecalled E-step andM-stepThemain idea is to find a tight lowerbound function of the objective function and then maximizethe lower bound function so that the objective functioncan be maximized indirectly The reason for introducing alower bound function is that finding the extreme value ofthe lower bound function is much easier compared to theoriginal function For GMM the E-step tries to guess thevalue of 119911(119894) equivalent to finding out which components thecurrent sample belongs to and then the model parametersare updated based on our conjecture The algorithm can besummarized as follows

E-step for each 119894 and 119895 compute the poster-probability of119911(119894)

119908119894

119895fl 119901 (119911

(119894)= 119895 | 119909

(119894) 120601 120583 Σ) (5)

M-step modify our model parameters

120601119895fl

1

119898

119898

sum

119894=1

119908(119894)

119895

120583119895fl

sum119898

119894=1119908(119894)

119895119909(119894)

sum119898

119894=1119908(119894)

119895

Σ119895fl

sum119898

119894=1119908(119894)

119895(119909(119894)

minus 120583119895) (119909(119894)

minus 120583119895)119879

sum119898

119894=1119908(119894)

119895

(6)

Note that the value of 119908119894

119895can be computed using the

following equation

119908119894

119895fl 119901 (119911

(119894)= 119895 | 119909

(119894) 120601 120583 Σ)

=

119901 (119909(119894)

| 119911(119894)

= 119895 120583 Σ) 119901 (119911(119894)

= 119895 120601)

sum119896

119897=1119901 (119909(119894) | 119911(119894) = 119897 120583 Σ) 119901 (119911(119894) = 119897 120601)

(7)

iterating between (6) and (7) until convergence whichmeansthat the changes ofmodel parameters are less than a threshold120598 or the value of likelihood function changes in a very narrowrange

4 The Architecture of ObstacleAvoidance Learning

Figure 2 shows the architecture of obstacle avoidance learningand the information flow between different modules Thewhole system contains three modules data acquisition andpreprocessing constraints extraction and trajectory retrievaland optimal control strategy derivation Mileage informationis collected by built-in sensors such as odometers anddistance information is acquired by Kinect Through noisefiltering invalid data reduction and sequence alignment forthe original data we get the input for the learning machineThen the input signal is encoded by GMM and a general

Journal of Robotics 5

Start

GoalObstacle

x120579

120596

dro

drg

Figure 3 Useful data collected by sensors Line in blue showsthe demonstrated trajectory Sectors with a fan shape in shallowyellow color represent the visual field of the Kinect 119909 120596 119889ro

119889rg

represent location orientation and distance information of therobot respectively

trajectory is generated by GMR The general trajectorysatisfies all the constraints for the task Eventually a controlleris designed to follow the expected input trajectory and outputthe control commands to the robot Subsequent sections willdescribe each part in detail

41 DataAcquisition Theexperimental data is acquired formaTurtlebot robot Turtlebot is awidely used research platformwhich is compatible with ROS integrating a variety ofsensors such as odometer and extensible vision sensors Wecan develop many kinds of applications by using Turtlebotsuch as mapping and navigation In this paper we willimplement our obstacle avoidance algorithm on the bases ofROS development kit for TurtlebotThe teaching process canbe achieved by three ways (i) using a remote joystick (ii)with the aid of Turtlebot development kit for the navigationalgorithms simulating some trajectories as training samplesand (iii) by means of the following apps implemented onTurtlebot the teacher can guide robot to avoid obstacles andrecord sensor data as samples In this paper we adopt thesecond method which means that we create simulated envi-ronment in software and then make robot execute multipleobstacle avoidance tasks in the simulated environment usingbuilt-in navigation algorithms During this process linearvelocity and angular velocity are obtained by built-in sensorslike odometer Using Kinect we can get the distance betweenthe robot and obstacles as well as distance from the robot tothe target

We assume that the important sensing information comesfrom (i) states of the robot defined as linear velocity andangular velocity (ii) the robotrsquos absolute location (iii) dis-tance between the robot and obstacles noting that the Kinectreturns many pieces of distance information representingthe relative position of the obstacle in a specific range andwe choose the shortest distance as our reference value and(iv) distance between the robot and the target which canbe computed indirectly As illustrated in Figure 3 we use

(119909 120596 119889ro 119889

rg) to represent the sensor informationmentioned

above respectively Here the location information and veloc-ity information are redundantWe think it is necessary to takeboth kinds of information into consideration for the reasonthat location information is very important and direct foran obstacle avoidance task In addition choosing variablesrelating to the task as many as possible is generally verynecessary in the initial stage of a task

In the experiment demonstration is conducted 119899 timesIn each demonstration 119879 data points are recorded so thedataset contains 119873 = 119899 times 119879 points in total Each samplepoint is a collection of variables represented as119863 = 119909 120596 119889in which 119909 stands for the location of robot 120596 stands for thecurrent orientation and 119889 stands for distance Each variablecontains a one-dimensional temporal value and (119889 minus 1)-dimensional spatial values that is 119909 = 119909

119904 119909119905 120596 = 120596

119904 120596119905

119889ro

= 119889ro119904 119889

ro119905 and 119889

rg= 119889

rg119904 119889

rg119905 119909119904isin R119873times2 represent

location 120596 isin R represent orientation 119889ro119904

isin R119873times2 representdistance between robot and obstacles and 119889

rg119904

isin R119873times2

represent distance between robot and goal state Supposingcoordinates of the goal are 119900

119904119894 then we have

119889rg119904

= 119909119904119894119895

minus 119900119904119894 forall119894 isin 1 119899 forall119895 isin 1 119879 (8)

where 119894 indicates the trajectory number and 119895 indicatestime in a trajectory We can rewrite the above equation forvelocities

rg119904

= 119904119894119895

minus 119904119894 (9)

42 Data Preprocessing Because of noise and bad demon-stration quality there exist seriousmismatch and invalid datain the original data As shown in Figure 4 there is no changeat the beginning and end parts of these curves so this partof data is useless To reduce computation cost it is a wisechoice to abandon these data In addition each trajectory hasdifferent lengthwe should resample them to a uniform lengthso that we can handle them easier in the subsequent stepsSo before inputting the data to our algorithms preprocessingsteps including noise filter useless data reduction and resam-pling are conducted Figure 4 displays differences between theoriginal data and the processed data

43 Trajectory Encoding and Retrieval Considering a data-point in training set 120585

119895= 120585119905119895 120585119904119895119873

119895=1isin 119863 120585

119895is a119863-dimensi-

onal vector We encode the data with GMM supposing wehave 119870 Gaussian components The probability items in (4)can be represented as

119901 (119896) = 120587119896

119901 (120585119895| 119896) = N (120585

119895 120583119896 Σ119896)

=1

radic(2120587)119863 1003816100381610038161003816Σ119896

1003816100381610038161003816

119890minus(12)((120585119895minus120583119896)

119879summinus1119896 (120585119895minus120583119896))

(10)

in which 120587119896 120583119896 Σ119896 are model parameters representing

prior-probability mean value and variance respectively

6 Journal of Robotics

0

2

4

6x

(m)

minus1

0

1

120596(r

ads

)

0

2

4

dro

(m)

0

2

4

drg

(m)

100 2000t (s)

100 2000t (s)

100 2000t (s)

100 2000t (s)

(a)

0

2

4

6

x(m

)

50 100 150 2000t (s)

minus1

0

1

120596(r

ads

)

50 100 150 2000t (s)

0

2

4

dro

(m)

50 100 150 2000t (s)

0

2

4

drg

(m)

50 100 150 2000t (s)

(b)

Figure 4 Comparison of original data and processed data (a) shows the original data Lines in red represent cutting position (b) displaysvalid data after processing

To solve these parameters the EM algorithm introducedin the previous section can be used When EM algorithmconverges to a given value we get the right 120587

119896 120583119896 Σ119896 that

is we encode the trajectory data successfullyIn order to get a general barrier-free trajectory which

is given an input which can be temporal signals or otherkinds of sensor data then a barrier-free path is generatedaccording to the given input a natural idea is to divide thevariable 120585 into two parts (120585I

119905 120585

O119905) which represents input and

output respectively and then trajectory retrieval problemcanbe formulized as follows given 120585 sim sum

119870

119894=1120587119894N(120583119894 Σ119894) and 120585

I119905

solve the density probability distribution for 120585O119905 Solution to

this problem is called Gaussian mixture regression Regard-ing 120585

I119905as query points 120585O

119905can be estimated using regression

methods In this paper we use superscriptsI andO standingfor the input and output part for each variable With thisnotation a block decomposition of the data-point 120585

119905 vectors

120583119896 and matrices Σ

119905can be written as

120585119905= [

120585I119896

120585O119896

]

120583119896= [

120583I119896

120583O119896

]

Σ119896= [

ΣI119896

ΣIO119896

ΣOI119896

ΣO119896

]

(11)

Based on (3) for a time step 119905 the mean vector andcovariance matrix for the 119896th component are

O119896(120585

I119905) = 120583

O119896+ Σ

OI119896

ΣIminus1119896

(120585I119905

minus 120583I119896)

ΣO

119896= Σ

O119896minus Σ

OI119896

ΣIminus1119896

ΣIO119896

(12)

Taking 119870 components into consideration the mean vec-tor and covariance matrix for distribution 120585

O119905given 120585

I119905are

119875 (120585O119905| 120585

I119905) sim

119870

sum

119896=1

ℎ119896(120585

I119905)N (

O119896(120585

I119905) Σ

O

119896) (13)

Journal of Robotics 7

where

ℎ119896(120585

I119905) =

120587119894N (120585

I119905

| 120583I119894 Σ

I119894)

sum119870

119896=1120587119896N (120585

I119905

| 120583I119896 Σ

I119896)

(14)

Equation (13) represents a multimodal distributionwhich can be approximated by the following Gaussian dis-tribution

119875 (120585O119905| 120585

I119905) = N (

O119905 Σ

O

119905) with

120583O119905=

119870

sum

119894=1

ℎ119894(120585

I119905) 120583

O119894(120585

I119905)

ΣO

119905=

119870

sum

119894=1

ℎ119894(120585

I119905) (Σ

O

119894+

O119894(120585

I119905)

O119894(120585

I119905)119879

)

minus O119905O119879119905

(15)

Substituting 120585I119905into 120585

O119905using equations described above

a general trajectory is obtained The trajectory not only canavoid obstacles but also can be executed by robots due to itsgood smooth property

44 Finding an Optimum Control Policy Through the previ-ous section we can obtain a general trajectory and this is aplanning problem How to combine planning with controlIn other words how to find a set of control inputs that canmake the robot follow the general trajectory accurately Forthe time-based GMR 120585O

119905is equal to 120585

119904 and 119904 represents

spatial location information Using 119904represents the desired

trajectory and 120585119904represents the actual reached location In

order to evaluate the tracking performance a simple idea isusing the weighted Euclidean distance

119863minus1

sum

119894=1

120596119894(120585119904119894

minus 119904119894)2

= (120585119904minus 119904)119879

119882(120585119904minus 119904) (16)

Naturally the above equation can be applied to mul-tiple variables For this paper we have four variables Let119909

120596

119889ro

119889rg

represent the desired trajectory the actualtracking performance 119875 can be expressed as

119875 = (119909

119904minus 119862119909)119879

119882119909(119909

119904minus 119862119909)

+ (120596

119904minus 119862120596)119879

119882120596(120596

119904minus 119862120596)

+ (119889ro

119904minus 119862119889ro)

119879

119882119889ro(119889ro

119904minus 119862119889ro)

+ (119889rg

119904minus 119862119889rg)

119879

119882119889rg(119889rg

119904minus 119862119889rg)

(17)

where

119862lowast119894119895

= lowast

119904119894119895minus 120585lowast

119904119894119895minus1

forall119894 isin 1 119899 forall119895 isin 1 119879 foralllowast isin 119909 120596 119889ro 119889

rg

(18)

According to performance indicator 119875 finding a desiredcontrol policy is equivalent to maximizing 119875 Since 119875 isa quadratic function an analytical solution can be easilyobtained by gradient method But noting that there existconstraints between variables as shown in (9) maximizing(17) turns into a constrained optimization problem Usuallythis kind of problem can be solved by constructing a Lagrangefunction defined as follows

119871 (119909

119904 120596

119904 119889ro

119904 119889rg

119904 120582) = 119875 + 120582

119879(119909

119904minus 119889rg

119904) (19)

where120582 is Lagrangemultiplier Lettingnabla119871 = 0 solving partialderivatives for

119909

119904 119889rg

119904 we find

2119882119909(119909

119904minus 119862119909) + 120582 = 0

2119882119889rg(119889rg

119904minus 119862119889rg) minus 120582 = 0

(20)

Adding (20) we find

2119882119889rg(119889rg

119904minus 119862119889rg) + 2119882

119909(119909

119904minus 119862119909) = 0 (21)

Substituting 119909

119904= minus119889rg

119904into the above equation we find

119909

119904= (119882

119909minus 119882119889rg)minus1

(119882119909119862119909+ 119882119889rg119862119889rg) (22)

Weight matrices in (22) can be computed by solving theinverse matrix of the current covariance matrix namely

119882 = (ΣO119905)minus1

(23)

5 Simulation Experiments

Experiments are implemented by using a Turtlebot robotIn addition to the robotrsquos own hardware system a Kinectis integrated to obtain distance information The size of themap used in the simulation is 75 times 5m Two different mapsare created to verify the learning effects For the first mapthe obstacle locates in the middle of the map and a smallgap lies in the lower side and the robot can reach the targetplace through the gap The second map is much similar tothe first one except that there exist two obstacles in the mapTo get the destination the robot must avoid every obstacle inthe map Figure 5 shows the shape of the map Figures 5(a)and 5(b) represent the 2D map and Figures 5(c) and 5(d)show a 3D simulation environment which corresponds tothe 2D maps The color maps in maps (a) and (b) stand forcost cloud which provides reference for path planning Thewarm color such as yellow and red stands for a lower costTo the contrary the cool color tune stands for a higher cost Apath planer always tends to choose a trajectory with the leastcostThe cost value is calculated by built-in functions in ROSpackage For the convenience of description we use task 1 andtask 2 to represent the two different obstacle avoidance tasksin maps (a) and (b) It is worth mentioning that we select rel-atively simple environment for both tasks for the reason that

8 Journal of Robotics

StartGoal

x

y

(a)

x

y

(b)

(c) (d)

Figure 5 Twodifferentmaps used in our simulation (a) and (b) are actualmaps generated by software (c) and (d) are simulation environmentcorresponding to maps above them

4 62x (m)

0

1

2

3

4

5

y(m

)

(a)

4 62x (m)

0

1

2

3

4

5

y(m

)

(b)

Figure 6 Demonstrated trajectories in two situations (a) represents sampled data in task 1 (b) represents sampled data in task 2

we putmore emphasis on themechanism behind the obstacleavoidance whichmeans we can learn skills from simple envi-ronment but these skills can be reused in complex situations

Figure 6 shows the demonstrated trajectories in twodifferent situations Coordinates of goal states are set to (6 3)

and (6 2) for task 1 and task 2 respectively The start pointsfor task 1 are randomly generated by sampling from uniformdistributions between 05 and 25 for 119883-coordinates and 2

to 4 for 119884-coordinates The maximum range is 2m whichcan guarantee a good generalization performance for obstacleavoidance For task 2 the start points are generated by sam-pling from uniform distributions from 05 to 2 and 2 to 35

51 Experiments for Trajectory Encoding and Retrieval Fig-ure 7 shows the encoding and retrieval results for task 1 Theencoded variables are location information 119909 and distance

Journal of Robotics 9

GMM

0

1

2

3

4

y(m

)

4 62

x (m)

minus1

0

1

2

3

drg y

(m)

1 2 3 40

drgx (m)

(a)

GMR

0

1

2

3

4

y(m

)

4 62

x (m)

minus1

0

1

2

3

drg y

(m)

2 40

drgx (m)

(b)

Figure 7 Trajectory encoding and retrieval using GMM and GMR for task 1 (a) represents model encoding with GMM Ellipse in dark bluerepresents projections of Gaussian components (b) represents general trajectory generated by GMR Shades in red color represent generationregion

information 119889rg and both variables are 2-dimensional vec-

tors Number of Gaussian components is set to 6 It isworth noting that the optimal number of components 119870

in a model may not be known beforehand so far Acommon method consists of estimating multiple modelswith increasing number of components and selecting anoptimum based on some model selection criterion suchas Cross Validation Bayesian Information Criterion (BIC)or Minimum Description Length (MDL) In our paper wedecide 119870 by using Cross Validation criterion As the figuresshow variance for starting position is large corresponding

to a big retrieval error It indicates that there are no strictrequirements for starting position Variance for end point ispretty small which indicates that the robot can get the goalstate accurately Thus they show that GMM can successfullyextract useful constraints for characterization task GMR canretrieve barrier-free trajectory by using information encodedby GMM as shown in the second column of Figure 5 Figuresin the second row show encoding and retrieval results forvariable 119889

rg As can be seen from the figures 119889rg shares

similar distribution properties with 119909This is because the twovariables have a specific relationship between each other so

10 Journal of Robotics

minus1

0

1120596

(rad

s)

10 200

t (s)

minus1

0

1

120596(r

ads

)

10 200

t (s)

10 200

t (s)

0

5

dro

(m)

10 200

t (s)

0

2

4

dro

(m)

10 200

t (s)

minus5

0

5

drg x

(m)

minus5

0

5

drg x

(m)

10 200

t (s)

minus5

0

5

10 200

t (s)10 200

t (s)

minus5

0

5

drg y

(m)

drg y

(m)

Figure 8 Encoding and retrieval for orientation and distance information in task 1 sharing the same meaning with charts in Figure 7

domean value and covariance which verify the validity of themethod indirectly

Figure 8 shows encoding results for orientation and dis-tance information Figures in the last two rows displayencoding for 119889rg in 119883 and 119884 direction As we see coordinatevalues in119883 directionmonotone decrease to 0This is becausethe robot keeps moving from left to right in the map whilecoordinates in 119884 direction keep going up at the beginningand then drop to zero This is because the robot encountersobstacles in the middle moment in order to avoid obstacles

the robot has to go away from the goal state temporarily Payattention to the final distance value which is close to (0 0)which means the robot reaches the target successfully

Figures 9 and 10 display encoding and retrieval resultsfor task 2 Compared with task 1 task 2 is more complicatedconsidering the law behind the data In order to encode theselaws more Gaussian components are required In our exper-iments 119870 is set to 10 by using the method described in task1 As illustrated in these figures no matter how complex theenvironment is this method still has very satisfactory effect

Journal of Robotics 11

GMM GMR

1

2

3

4

y(m

)

3 4 5 62

x (m)

1

2

3

4

y(m

)

3 4 5 62

x (m)

minus2

minus1

0

1

drg y

(m)

10 3 42

drgx (m)

10 3 42

drgx (m)

minus2

minus1

0

1

drg y

(m)

Figure 9 Trajectory encoding and retrieval using GMM and GMR for variables 119909 and 119889 in task 2

52 Experiments for Finding Control Strategy Experiments inthis part are mainly conducted to verify the derived controlpolicy For a Turtlebot robot the control input is linearvelocity

119909

119904and angular velocity

120579

119904 Assuming we have a

desired trajectory (the trajectory can be given arbitrarily inour experiments in order to find a reference we choose thesecond trajectory in our samples) using theories derived inprevious sections we can compute

119909

119904and

120579

119904 Iterating on

every time we can get a control sequence Comparing thecalculated value with the sensor recorded value we can verifythe validity of the derived results

Figure 12 compares the calculated linear velocity andangular velocity with the desired value Linear velocity isdivided into velocity in 119883 direction and 119884 direction Asthe figure in the second row shows expected velocity in 119884

direction remains close to 0This is because the control inputsfor Turtlebot are a linear velocity in the forward directionthat is 119883 direction and an angular velocity The line graphfor velocity 119884 reveals that there is a minor fluctuation at thebeginning and end stage while the overall trend is consistentwith the fact Velocity 119909 follows the expected value perfectlyConsidering Figure 12(b) it can be seen that the calculatedvalue remains consistent with the actual one It is worth

noting that the length of the calculated sequence differs fromthe actual one since the frequency of the readings fromrobotrsquos sensors is lower than the frequency of commandsIn this case to compare the similarity of the two signalssequence alignment is an indispensable step In this paper weuse dynamic time warping (DTW) to achieve this purposeSignals after alignments show a high correlation whichmeans the calculated values are correct

Figure 11 shows the location information obtained byintegrating velocity signals The calculated control inputsallow the robot to follow the given trajectory accuratelyexcept a minor fluctuation at the beginning time

6 Conclusion

In this paper based on the idea of the imitation learning wepropose a novel idea that the robotrsquos obstacle avoidance skillcan be learned by showing the robot how to avoid an obstacleUnder this consideration task constraints extraction is a keyissue that is how to make robots understand the task Inorder to achieve this purpose the Gaussian mixture model isused to encode the data GMR is used to retrieve satisfactory

12 Journal of Robotics

minus2

0

2120596

(rad

s)

10 20 30 400

t (s)

minus2

0

2

120596(r

ads

)

10 20 30 400

t (s)

0

2

4

dro

(m)

10 20 30 400

t (s)10 20 30 400

t (s)

0

2

4

dro

(m)

minus5

0

5

drg x

(m)

10 20 30 400

t (s)

minus5

0

5

drg x

(m)

10 20 30 400

t (s)

minus5

0

5

10 20 30 400

t (s)10 20 30 400

t (s)

minus5

0

5

drg y

(m)

drg y

(m)

Figure 10 Encoding and retrieval for orientation and distance information in task 2

Expected pathActual path

2 3 4 5 6 71x (m)

05

1

15

2

25

3

35

4

y(m

)

Figure 11 Tracking performance using the derived policy

Journal of Robotics 13

Expected valueCalculated value

50 100 150 2000t (s)

50 100 150 2000t (s)

minus202

Vx

(ms

)

minus101

Vy

(ms

)

(a) Linear velocity

Expected valueCalculated value

50 100 150 2000t (s)

minus202

120596(r

ads

)

(b) Angular velocity

Figure 12 Comparisons between the expected angular velocity andcalculated value

trajectories Eventually to obtain the control strategy a mul-tiobjective optimization problem is constructed The controlstrategy is derived by solving this problem Simulation exper-iments verified our algorithms

For the future improvement work mainly includes thefollowing aspects (i) in order to avoid obstacles onlineonline incremental GMM algorithm should be used (ii) inthis paper a time-based trajectory is generated by GMRthat is use of time signal as input A more reasonable ideais using state information as input so that the robot canchange its decision according to the current environmentfor example using distance information as input and velocityas output thus the control inputs can be directly generated(iii) reinforcement learning algorithms can be integrated tostrengthen robotsrsquo exploration ability so that better general-ization performance is achieved

Competing Interests

The authors declare that the grant scholarship andor fund-ing do not lead to any conflict of interests Additionally theauthors declare that there is no conflict of interests regardingthe publication of this manuscript

Acknowledgments

This work is supported by National Natural Science Founda-tion of China (Grant no 51505470) and the Dr Startup Fundin Liaoning province (20141152)

References

[1] E Masehian and D Sedighizadeh ldquoClassic and heuristicapproaches in robotmotion planningmdasha chronological reviewrdquo

World Academy of Science Engineering and Technology vol 23pp 101ndash106 2007

[2] P Raja and S Pugazhenthi ldquoOptimal path planning of mobilerobots a reviewrdquo International Journal of Physical Sciences vol7 no 9 pp 1314ndash1320 2012

[3] J C Latombe Robot Motion Planning Springer 2012[4] J Borenstein and Y Koren ldquoReal-time obstacle avoidance for

fast mobile robotsrdquo IEEE Transactions on Systems Man andCybernetics vol 19 no 5 pp 1179ndash1187 1989

[5] O Khatib ldquoReal-time obstacle avoidance for manipulators andmobile robotsrdquo International Journal of Robotics Research vol5 no 1 pp 90ndash98 1986

[6] H Mei Y Tian and L Zu ldquoA hybrid ant colony optimizationalgorithm for path planning of robot in dynamic environmentrdquoInternational Journal of Information Technology vol 12 no 3pp 78ndash88 2006

[7] M A P Garcia O Montiel O Castillo R Sepulveda and PMelin ldquoPath planning for autonomous mobile robot navigationwith ant colony optimization and fuzzy cost function evalua-tionrdquo Applied Soft Computing vol 9 no 3 pp 1102ndash1110 2009

[8] C Lopez-Franco J Zepeda N Arana-Daniel and L Lopez-Franco ldquoObstacle avoidance using PSOrdquo in Proceedings of the9th International Conference on Electrical Engineering Comput-ing Science and Automatic Control (CCE rsquo12) pp 1ndash6 MexicoCity Mexico September 2012

[9] H-QMin J-H Zhu andX-J Zheng ldquoObstacle avoidancewithmulti-objective optimization by PSO in dynamic environmentrdquoin Proceedings of the International Conference on MachineLearning and Cybernetics (ICMLC rsquo05) vol 5 pp 2950ndash2956IEEE Guangzhou China August 2005

[10] S Dutta ldquoObstacle avoidance of mobile robot using PSO-basedneuro fuzzy techniquerdquo International Journal of ComputerScience and Engineering vol 2 no 2 pp 301ndash304 2010

[11] W-G Han S-M Baek and T-Y Kuc ldquoGenetic algorithmbased path planning and dynamic obstacle avoidance of mobilerobotsrdquo in Proceedings of the IEEE International Conference onSystems Man and Cybernetics vol 3 pp 2747ndash2751 OrlandoFla USA October 1997

[12] J Tu and S X Yang ldquoGenetic algorithm based path planningfor a mobile robotrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo03) vol 1 pp1221ndash1226 September 2003

[13] Y Zhang and J Wang ldquoObstacle avoidance for kinematicallyredundant manipulators using a dual neural networkrdquo IEEETransactions on Systems Man and Cybernetics Part B Cyber-netics vol 34 no 1 pp 752ndash759 2004

[14] X Wang V Yadav and S N Balakrishnan ldquoCooperativeUAV formation flying with obstaclecollision avoidancerdquo IEEETransactions on Control Systems Technology vol 15 no 4 pp672ndash679 2007

[15] J Godjevac ldquoA learning procedure for a fuzzy system applica-tion to obstacle avoidancerdquo in Proceedings of the InternationalSymposium on Fuzzy Logic pp 142ndash148 Zurich Switzerland1995

[16] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press Boston Mass USA 1998

[17] J Michels A Saxena and A Y Ng ldquoHigh speed obstacleavoidance usingmonocular vision and reinforcement learningrdquoin Proceedings of the 22nd International Conference on MachineLearning (ICML rsquo05) pp 593ndash600 ACMBonnGermany 2005

14 Journal of Robotics

[18] C Ye N H C Yung and D Wang ldquoA fuzzy controller withsupervised learning assisted reinforcement learning algorithmfor obstacle avoidancerdquo IEEETransactions on SystemsMan andCybernetics Part B Cybernetics vol 33 no 1 pp 17ndash27 2003

[19] B-Q Huang G-Y Cao and M Guo ldquoReinforcement learningneural network to the problem of autonomous mobile robotobstacle avoidancerdquo in Proceedings of the International Confer-ence on Machine Learning and Cybernetics (ICMLC rsquo05) vol 1pp 85ndash89 August 2005

[20] C F Touzet ldquoNeural reinforcement learning for behavioursynthesisrdquo Robotics and Autonomous Systems vol 22 no 3-4pp 251ndash281 1997

[21] W D Smart and L P Kaelbling ldquoEffective reinforcement learn-ing for mobile robotsrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo02) vol 4 pp3404ndash3410 Washington DC USA May 2002

[22] K Macek I Petrovic and N Peric ldquoA reinforcement learningapproach to obstacle avoidance ofmobile robotsrdquo inProceedingsof the 7th International Workshop on Advanced Motion Control(AMC rsquo02) pp 462ndash466 IEEE Maribor Slovenia July 2002

[23] U Muller J Ben E Cosatto et al ldquoOff-road obstacle avoidancethrough end-to-end learningrdquo in Advances in Neural Informa-tion Processing Systems pp 739ndash746 2005

[24] B D Argall S Chernova M Veloso and B Browning ldquoAsurvey of robot learning from demonstrationrdquo Robotics andAutonomous Systems vol 57 no 5 pp 469ndash483 2009

[25] C L Nehaniv and K Dautenhahn ldquoLike me-measures ofcorrespondence and imitationrdquo Cybernetics amp Systems vol 32no 1-2 pp 11ndash51 2001

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 5: Research Article Robot Obstacle Avoidance Learning Based ...Research Article Robot Obstacle Avoidance Learning Based on Mixture Models HuiwenZhang, 1,2 XiaoningHan, 1,2 MingliangFu,

Journal of Robotics 5

Start

GoalObstacle

x120579

120596

dro

drg

Figure 3 Useful data collected by sensors Line in blue showsthe demonstrated trajectory Sectors with a fan shape in shallowyellow color represent the visual field of the Kinect 119909 120596 119889ro

119889rg

represent location orientation and distance information of therobot respectively

trajectory is generated by GMR The general trajectorysatisfies all the constraints for the task Eventually a controlleris designed to follow the expected input trajectory and outputthe control commands to the robot Subsequent sections willdescribe each part in detail

41 DataAcquisition Theexperimental data is acquired formaTurtlebot robot Turtlebot is awidely used research platformwhich is compatible with ROS integrating a variety ofsensors such as odometer and extensible vision sensors Wecan develop many kinds of applications by using Turtlebotsuch as mapping and navigation In this paper we willimplement our obstacle avoidance algorithm on the bases ofROS development kit for TurtlebotThe teaching process canbe achieved by three ways (i) using a remote joystick (ii)with the aid of Turtlebot development kit for the navigationalgorithms simulating some trajectories as training samplesand (iii) by means of the following apps implemented onTurtlebot the teacher can guide robot to avoid obstacles andrecord sensor data as samples In this paper we adopt thesecond method which means that we create simulated envi-ronment in software and then make robot execute multipleobstacle avoidance tasks in the simulated environment usingbuilt-in navigation algorithms During this process linearvelocity and angular velocity are obtained by built-in sensorslike odometer Using Kinect we can get the distance betweenthe robot and obstacles as well as distance from the robot tothe target

We assume that the important sensing information comesfrom (i) states of the robot defined as linear velocity andangular velocity (ii) the robotrsquos absolute location (iii) dis-tance between the robot and obstacles noting that the Kinectreturns many pieces of distance information representingthe relative position of the obstacle in a specific range andwe choose the shortest distance as our reference value and(iv) distance between the robot and the target which canbe computed indirectly As illustrated in Figure 3 we use

(119909 120596 119889ro 119889

rg) to represent the sensor informationmentioned

above respectively Here the location information and veloc-ity information are redundantWe think it is necessary to takeboth kinds of information into consideration for the reasonthat location information is very important and direct foran obstacle avoidance task In addition choosing variablesrelating to the task as many as possible is generally verynecessary in the initial stage of a task

In the experiment demonstration is conducted 119899 timesIn each demonstration 119879 data points are recorded so thedataset contains 119873 = 119899 times 119879 points in total Each samplepoint is a collection of variables represented as119863 = 119909 120596 119889in which 119909 stands for the location of robot 120596 stands for thecurrent orientation and 119889 stands for distance Each variablecontains a one-dimensional temporal value and (119889 minus 1)-dimensional spatial values that is 119909 = 119909

119904 119909119905 120596 = 120596

119904 120596119905

119889ro

= 119889ro119904 119889

ro119905 and 119889

rg= 119889

rg119904 119889

rg119905 119909119904isin R119873times2 represent

location 120596 isin R represent orientation 119889ro119904

isin R119873times2 representdistance between robot and obstacles and 119889

rg119904

isin R119873times2

represent distance between robot and goal state Supposingcoordinates of the goal are 119900

119904119894 then we have

119889rg119904

= 119909119904119894119895

minus 119900119904119894 forall119894 isin 1 119899 forall119895 isin 1 119879 (8)

where 119894 indicates the trajectory number and 119895 indicatestime in a trajectory We can rewrite the above equation forvelocities

rg119904

= 119904119894119895

minus 119904119894 (9)

42 Data Preprocessing Because of noise and bad demon-stration quality there exist seriousmismatch and invalid datain the original data As shown in Figure 4 there is no changeat the beginning and end parts of these curves so this partof data is useless To reduce computation cost it is a wisechoice to abandon these data In addition each trajectory hasdifferent lengthwe should resample them to a uniform lengthso that we can handle them easier in the subsequent stepsSo before inputting the data to our algorithms preprocessingsteps including noise filter useless data reduction and resam-pling are conducted Figure 4 displays differences between theoriginal data and the processed data

43 Trajectory Encoding and Retrieval Considering a data-point in training set 120585

119895= 120585119905119895 120585119904119895119873

119895=1isin 119863 120585

119895is a119863-dimensi-

onal vector We encode the data with GMM supposing wehave 119870 Gaussian components The probability items in (4)can be represented as

119901 (119896) = 120587119896

119901 (120585119895| 119896) = N (120585

119895 120583119896 Σ119896)

=1

radic(2120587)119863 1003816100381610038161003816Σ119896

1003816100381610038161003816

119890minus(12)((120585119895minus120583119896)

119879summinus1119896 (120585119895minus120583119896))

(10)

in which 120587119896 120583119896 Σ119896 are model parameters representing

prior-probability mean value and variance respectively

6 Journal of Robotics

0

2

4

6x

(m)

minus1

0

1

120596(r

ads

)

0

2

4

dro

(m)

0

2

4

drg

(m)

100 2000t (s)

100 2000t (s)

100 2000t (s)

100 2000t (s)

(a)

0

2

4

6

x(m

)

50 100 150 2000t (s)

minus1

0

1

120596(r

ads

)

50 100 150 2000t (s)

0

2

4

dro

(m)

50 100 150 2000t (s)

0

2

4

drg

(m)

50 100 150 2000t (s)

(b)

Figure 4 Comparison of original data and processed data (a) shows the original data Lines in red represent cutting position (b) displaysvalid data after processing

To solve these parameters the EM algorithm introducedin the previous section can be used When EM algorithmconverges to a given value we get the right 120587

119896 120583119896 Σ119896 that

is we encode the trajectory data successfullyIn order to get a general barrier-free trajectory which

is given an input which can be temporal signals or otherkinds of sensor data then a barrier-free path is generatedaccording to the given input a natural idea is to divide thevariable 120585 into two parts (120585I

119905 120585

O119905) which represents input and

output respectively and then trajectory retrieval problemcanbe formulized as follows given 120585 sim sum

119870

119894=1120587119894N(120583119894 Σ119894) and 120585

I119905

solve the density probability distribution for 120585O119905 Solution to

this problem is called Gaussian mixture regression Regard-ing 120585

I119905as query points 120585O

119905can be estimated using regression

methods In this paper we use superscriptsI andO standingfor the input and output part for each variable With thisnotation a block decomposition of the data-point 120585

119905 vectors

120583119896 and matrices Σ

119905can be written as

120585119905= [

120585I119896

120585O119896

]

120583119896= [

120583I119896

120583O119896

]

Σ119896= [

ΣI119896

ΣIO119896

ΣOI119896

ΣO119896

]

(11)

Based on (3) for a time step 119905 the mean vector andcovariance matrix for the 119896th component are

O119896(120585

I119905) = 120583

O119896+ Σ

OI119896

ΣIminus1119896

(120585I119905

minus 120583I119896)

ΣO

119896= Σ

O119896minus Σ

OI119896

ΣIminus1119896

ΣIO119896

(12)

Taking 119870 components into consideration the mean vec-tor and covariance matrix for distribution 120585

O119905given 120585

I119905are

119875 (120585O119905| 120585

I119905) sim

119870

sum

119896=1

ℎ119896(120585

I119905)N (

O119896(120585

I119905) Σ

O

119896) (13)

Journal of Robotics 7

where

ℎ119896(120585

I119905) =

120587119894N (120585

I119905

| 120583I119894 Σ

I119894)

sum119870

119896=1120587119896N (120585

I119905

| 120583I119896 Σ

I119896)

(14)

Equation (13) represents a multimodal distributionwhich can be approximated by the following Gaussian dis-tribution

119875 (120585O119905| 120585

I119905) = N (

O119905 Σ

O

119905) with

120583O119905=

119870

sum

119894=1

ℎ119894(120585

I119905) 120583

O119894(120585

I119905)

ΣO

119905=

119870

sum

119894=1

ℎ119894(120585

I119905) (Σ

O

119894+

O119894(120585

I119905)

O119894(120585

I119905)119879

)

minus O119905O119879119905

(15)

Substituting 120585I119905into 120585

O119905using equations described above

a general trajectory is obtained The trajectory not only canavoid obstacles but also can be executed by robots due to itsgood smooth property

44 Finding an Optimum Control Policy Through the previ-ous section we can obtain a general trajectory and this is aplanning problem How to combine planning with controlIn other words how to find a set of control inputs that canmake the robot follow the general trajectory accurately Forthe time-based GMR 120585O

119905is equal to 120585

119904 and 119904 represents

spatial location information Using 119904represents the desired

trajectory and 120585119904represents the actual reached location In

order to evaluate the tracking performance a simple idea isusing the weighted Euclidean distance

119863minus1

sum

119894=1

120596119894(120585119904119894

minus 119904119894)2

= (120585119904minus 119904)119879

119882(120585119904minus 119904) (16)

Naturally the above equation can be applied to mul-tiple variables For this paper we have four variables Let119909

120596

119889ro

119889rg

represent the desired trajectory the actualtracking performance 119875 can be expressed as

119875 = (119909

119904minus 119862119909)119879

119882119909(119909

119904minus 119862119909)

+ (120596

119904minus 119862120596)119879

119882120596(120596

119904minus 119862120596)

+ (119889ro

119904minus 119862119889ro)

119879

119882119889ro(119889ro

119904minus 119862119889ro)

+ (119889rg

119904minus 119862119889rg)

119879

119882119889rg(119889rg

119904minus 119862119889rg)

(17)

where

119862lowast119894119895

= lowast

119904119894119895minus 120585lowast

119904119894119895minus1

forall119894 isin 1 119899 forall119895 isin 1 119879 foralllowast isin 119909 120596 119889ro 119889

rg

(18)

According to performance indicator 119875 finding a desiredcontrol policy is equivalent to maximizing 119875 Since 119875 isa quadratic function an analytical solution can be easilyobtained by gradient method But noting that there existconstraints between variables as shown in (9) maximizing(17) turns into a constrained optimization problem Usuallythis kind of problem can be solved by constructing a Lagrangefunction defined as follows

119871 (119909

119904 120596

119904 119889ro

119904 119889rg

119904 120582) = 119875 + 120582

119879(119909

119904minus 119889rg

119904) (19)

where120582 is Lagrangemultiplier Lettingnabla119871 = 0 solving partialderivatives for

119909

119904 119889rg

119904 we find

2119882119909(119909

119904minus 119862119909) + 120582 = 0

2119882119889rg(119889rg

119904minus 119862119889rg) minus 120582 = 0

(20)

Adding (20) we find

2119882119889rg(119889rg

119904minus 119862119889rg) + 2119882

119909(119909

119904minus 119862119909) = 0 (21)

Substituting 119909

119904= minus119889rg

119904into the above equation we find

119909

119904= (119882

119909minus 119882119889rg)minus1

(119882119909119862119909+ 119882119889rg119862119889rg) (22)

Weight matrices in (22) can be computed by solving theinverse matrix of the current covariance matrix namely

119882 = (ΣO119905)minus1

(23)

5 Simulation Experiments

Experiments are implemented by using a Turtlebot robotIn addition to the robotrsquos own hardware system a Kinectis integrated to obtain distance information The size of themap used in the simulation is 75 times 5m Two different mapsare created to verify the learning effects For the first mapthe obstacle locates in the middle of the map and a smallgap lies in the lower side and the robot can reach the targetplace through the gap The second map is much similar tothe first one except that there exist two obstacles in the mapTo get the destination the robot must avoid every obstacle inthe map Figure 5 shows the shape of the map Figures 5(a)and 5(b) represent the 2D map and Figures 5(c) and 5(d)show a 3D simulation environment which corresponds tothe 2D maps The color maps in maps (a) and (b) stand forcost cloud which provides reference for path planning Thewarm color such as yellow and red stands for a lower costTo the contrary the cool color tune stands for a higher cost Apath planer always tends to choose a trajectory with the leastcostThe cost value is calculated by built-in functions in ROSpackage For the convenience of description we use task 1 andtask 2 to represent the two different obstacle avoidance tasksin maps (a) and (b) It is worth mentioning that we select rel-atively simple environment for both tasks for the reason that

8 Journal of Robotics

StartGoal

x

y

(a)

x

y

(b)

(c) (d)

Figure 5 Twodifferentmaps used in our simulation (a) and (b) are actualmaps generated by software (c) and (d) are simulation environmentcorresponding to maps above them

4 62x (m)

0

1

2

3

4

5

y(m

)

(a)

4 62x (m)

0

1

2

3

4

5

y(m

)

(b)

Figure 6 Demonstrated trajectories in two situations (a) represents sampled data in task 1 (b) represents sampled data in task 2

we putmore emphasis on themechanism behind the obstacleavoidance whichmeans we can learn skills from simple envi-ronment but these skills can be reused in complex situations

Figure 6 shows the demonstrated trajectories in twodifferent situations Coordinates of goal states are set to (6 3)

and (6 2) for task 1 and task 2 respectively The start pointsfor task 1 are randomly generated by sampling from uniformdistributions between 05 and 25 for 119883-coordinates and 2

to 4 for 119884-coordinates The maximum range is 2m whichcan guarantee a good generalization performance for obstacleavoidance For task 2 the start points are generated by sam-pling from uniform distributions from 05 to 2 and 2 to 35

51 Experiments for Trajectory Encoding and Retrieval Fig-ure 7 shows the encoding and retrieval results for task 1 Theencoded variables are location information 119909 and distance

Journal of Robotics 9

GMM

0

1

2

3

4

y(m

)

4 62

x (m)

minus1

0

1

2

3

drg y

(m)

1 2 3 40

drgx (m)

(a)

GMR

0

1

2

3

4

y(m

)

4 62

x (m)

minus1

0

1

2

3

drg y

(m)

2 40

drgx (m)

(b)

Figure 7 Trajectory encoding and retrieval using GMM and GMR for task 1 (a) represents model encoding with GMM Ellipse in dark bluerepresents projections of Gaussian components (b) represents general trajectory generated by GMR Shades in red color represent generationregion

information 119889rg and both variables are 2-dimensional vec-

tors Number of Gaussian components is set to 6 It isworth noting that the optimal number of components 119870

in a model may not be known beforehand so far Acommon method consists of estimating multiple modelswith increasing number of components and selecting anoptimum based on some model selection criterion suchas Cross Validation Bayesian Information Criterion (BIC)or Minimum Description Length (MDL) In our paper wedecide 119870 by using Cross Validation criterion As the figuresshow variance for starting position is large corresponding

to a big retrieval error It indicates that there are no strictrequirements for starting position Variance for end point ispretty small which indicates that the robot can get the goalstate accurately Thus they show that GMM can successfullyextract useful constraints for characterization task GMR canretrieve barrier-free trajectory by using information encodedby GMM as shown in the second column of Figure 5 Figuresin the second row show encoding and retrieval results forvariable 119889

rg As can be seen from the figures 119889rg shares

similar distribution properties with 119909This is because the twovariables have a specific relationship between each other so

10 Journal of Robotics

minus1

0

1120596

(rad

s)

10 200

t (s)

minus1

0

1

120596(r

ads

)

10 200

t (s)

10 200

t (s)

0

5

dro

(m)

10 200

t (s)

0

2

4

dro

(m)

10 200

t (s)

minus5

0

5

drg x

(m)

minus5

0

5

drg x

(m)

10 200

t (s)

minus5

0

5

10 200

t (s)10 200

t (s)

minus5

0

5

drg y

(m)

drg y

(m)

Figure 8 Encoding and retrieval for orientation and distance information in task 1 sharing the same meaning with charts in Figure 7

domean value and covariance which verify the validity of themethod indirectly

Figure 8 shows encoding results for orientation and dis-tance information Figures in the last two rows displayencoding for 119889rg in 119883 and 119884 direction As we see coordinatevalues in119883 directionmonotone decrease to 0This is becausethe robot keeps moving from left to right in the map whilecoordinates in 119884 direction keep going up at the beginningand then drop to zero This is because the robot encountersobstacles in the middle moment in order to avoid obstacles

the robot has to go away from the goal state temporarily Payattention to the final distance value which is close to (0 0)which means the robot reaches the target successfully

Figures 9 and 10 display encoding and retrieval resultsfor task 2 Compared with task 1 task 2 is more complicatedconsidering the law behind the data In order to encode theselaws more Gaussian components are required In our exper-iments 119870 is set to 10 by using the method described in task1 As illustrated in these figures no matter how complex theenvironment is this method still has very satisfactory effect

Journal of Robotics 11

GMM GMR

1

2

3

4

y(m

)

3 4 5 62

x (m)

1

2

3

4

y(m

)

3 4 5 62

x (m)

minus2

minus1

0

1

drg y

(m)

10 3 42

drgx (m)

10 3 42

drgx (m)

minus2

minus1

0

1

drg y

(m)

Figure 9 Trajectory encoding and retrieval using GMM and GMR for variables 119909 and 119889 in task 2

52 Experiments for Finding Control Strategy Experiments inthis part are mainly conducted to verify the derived controlpolicy For a Turtlebot robot the control input is linearvelocity

119909

119904and angular velocity

120579

119904 Assuming we have a

desired trajectory (the trajectory can be given arbitrarily inour experiments in order to find a reference we choose thesecond trajectory in our samples) using theories derived inprevious sections we can compute

119909

119904and

120579

119904 Iterating on

every time we can get a control sequence Comparing thecalculated value with the sensor recorded value we can verifythe validity of the derived results

Figure 12 compares the calculated linear velocity andangular velocity with the desired value Linear velocity isdivided into velocity in 119883 direction and 119884 direction Asthe figure in the second row shows expected velocity in 119884

direction remains close to 0This is because the control inputsfor Turtlebot are a linear velocity in the forward directionthat is 119883 direction and an angular velocity The line graphfor velocity 119884 reveals that there is a minor fluctuation at thebeginning and end stage while the overall trend is consistentwith the fact Velocity 119909 follows the expected value perfectlyConsidering Figure 12(b) it can be seen that the calculatedvalue remains consistent with the actual one It is worth

noting that the length of the calculated sequence differs fromthe actual one since the frequency of the readings fromrobotrsquos sensors is lower than the frequency of commandsIn this case to compare the similarity of the two signalssequence alignment is an indispensable step In this paper weuse dynamic time warping (DTW) to achieve this purposeSignals after alignments show a high correlation whichmeans the calculated values are correct

Figure 11 shows the location information obtained byintegrating velocity signals The calculated control inputsallow the robot to follow the given trajectory accuratelyexcept a minor fluctuation at the beginning time

6 Conclusion

In this paper based on the idea of the imitation learning wepropose a novel idea that the robotrsquos obstacle avoidance skillcan be learned by showing the robot how to avoid an obstacleUnder this consideration task constraints extraction is a keyissue that is how to make robots understand the task Inorder to achieve this purpose the Gaussian mixture model isused to encode the data GMR is used to retrieve satisfactory

12 Journal of Robotics

minus2

0

2120596

(rad

s)

10 20 30 400

t (s)

minus2

0

2

120596(r

ads

)

10 20 30 400

t (s)

0

2

4

dro

(m)

10 20 30 400

t (s)10 20 30 400

t (s)

0

2

4

dro

(m)

minus5

0

5

drg x

(m)

10 20 30 400

t (s)

minus5

0

5

drg x

(m)

10 20 30 400

t (s)

minus5

0

5

10 20 30 400

t (s)10 20 30 400

t (s)

minus5

0

5

drg y

(m)

drg y

(m)

Figure 10 Encoding and retrieval for orientation and distance information in task 2

Expected pathActual path

2 3 4 5 6 71x (m)

05

1

15

2

25

3

35

4

y(m

)

Figure 11 Tracking performance using the derived policy

Journal of Robotics 13

Expected valueCalculated value

50 100 150 2000t (s)

50 100 150 2000t (s)

minus202

Vx

(ms

)

minus101

Vy

(ms

)

(a) Linear velocity

Expected valueCalculated value

50 100 150 2000t (s)

minus202

120596(r

ads

)

(b) Angular velocity

Figure 12 Comparisons between the expected angular velocity andcalculated value

trajectories Eventually to obtain the control strategy a mul-tiobjective optimization problem is constructed The controlstrategy is derived by solving this problem Simulation exper-iments verified our algorithms

For the future improvement work mainly includes thefollowing aspects (i) in order to avoid obstacles onlineonline incremental GMM algorithm should be used (ii) inthis paper a time-based trajectory is generated by GMRthat is use of time signal as input A more reasonable ideais using state information as input so that the robot canchange its decision according to the current environmentfor example using distance information as input and velocityas output thus the control inputs can be directly generated(iii) reinforcement learning algorithms can be integrated tostrengthen robotsrsquo exploration ability so that better general-ization performance is achieved

Competing Interests

The authors declare that the grant scholarship andor fund-ing do not lead to any conflict of interests Additionally theauthors declare that there is no conflict of interests regardingthe publication of this manuscript

Acknowledgments

This work is supported by National Natural Science Founda-tion of China (Grant no 51505470) and the Dr Startup Fundin Liaoning province (20141152)

References

[1] E Masehian and D Sedighizadeh ldquoClassic and heuristicapproaches in robotmotion planningmdasha chronological reviewrdquo

World Academy of Science Engineering and Technology vol 23pp 101ndash106 2007

[2] P Raja and S Pugazhenthi ldquoOptimal path planning of mobilerobots a reviewrdquo International Journal of Physical Sciences vol7 no 9 pp 1314ndash1320 2012

[3] J C Latombe Robot Motion Planning Springer 2012[4] J Borenstein and Y Koren ldquoReal-time obstacle avoidance for

fast mobile robotsrdquo IEEE Transactions on Systems Man andCybernetics vol 19 no 5 pp 1179ndash1187 1989

[5] O Khatib ldquoReal-time obstacle avoidance for manipulators andmobile robotsrdquo International Journal of Robotics Research vol5 no 1 pp 90ndash98 1986

[6] H Mei Y Tian and L Zu ldquoA hybrid ant colony optimizationalgorithm for path planning of robot in dynamic environmentrdquoInternational Journal of Information Technology vol 12 no 3pp 78ndash88 2006

[7] M A P Garcia O Montiel O Castillo R Sepulveda and PMelin ldquoPath planning for autonomous mobile robot navigationwith ant colony optimization and fuzzy cost function evalua-tionrdquo Applied Soft Computing vol 9 no 3 pp 1102ndash1110 2009

[8] C Lopez-Franco J Zepeda N Arana-Daniel and L Lopez-Franco ldquoObstacle avoidance using PSOrdquo in Proceedings of the9th International Conference on Electrical Engineering Comput-ing Science and Automatic Control (CCE rsquo12) pp 1ndash6 MexicoCity Mexico September 2012

[9] H-QMin J-H Zhu andX-J Zheng ldquoObstacle avoidancewithmulti-objective optimization by PSO in dynamic environmentrdquoin Proceedings of the International Conference on MachineLearning and Cybernetics (ICMLC rsquo05) vol 5 pp 2950ndash2956IEEE Guangzhou China August 2005

[10] S Dutta ldquoObstacle avoidance of mobile robot using PSO-basedneuro fuzzy techniquerdquo International Journal of ComputerScience and Engineering vol 2 no 2 pp 301ndash304 2010

[11] W-G Han S-M Baek and T-Y Kuc ldquoGenetic algorithmbased path planning and dynamic obstacle avoidance of mobilerobotsrdquo in Proceedings of the IEEE International Conference onSystems Man and Cybernetics vol 3 pp 2747ndash2751 OrlandoFla USA October 1997

[12] J Tu and S X Yang ldquoGenetic algorithm based path planningfor a mobile robotrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo03) vol 1 pp1221ndash1226 September 2003

[13] Y Zhang and J Wang ldquoObstacle avoidance for kinematicallyredundant manipulators using a dual neural networkrdquo IEEETransactions on Systems Man and Cybernetics Part B Cyber-netics vol 34 no 1 pp 752ndash759 2004

[14] X Wang V Yadav and S N Balakrishnan ldquoCooperativeUAV formation flying with obstaclecollision avoidancerdquo IEEETransactions on Control Systems Technology vol 15 no 4 pp672ndash679 2007

[15] J Godjevac ldquoA learning procedure for a fuzzy system applica-tion to obstacle avoidancerdquo in Proceedings of the InternationalSymposium on Fuzzy Logic pp 142ndash148 Zurich Switzerland1995

[16] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press Boston Mass USA 1998

[17] J Michels A Saxena and A Y Ng ldquoHigh speed obstacleavoidance usingmonocular vision and reinforcement learningrdquoin Proceedings of the 22nd International Conference on MachineLearning (ICML rsquo05) pp 593ndash600 ACMBonnGermany 2005

14 Journal of Robotics

[18] C Ye N H C Yung and D Wang ldquoA fuzzy controller withsupervised learning assisted reinforcement learning algorithmfor obstacle avoidancerdquo IEEETransactions on SystemsMan andCybernetics Part B Cybernetics vol 33 no 1 pp 17ndash27 2003

[19] B-Q Huang G-Y Cao and M Guo ldquoReinforcement learningneural network to the problem of autonomous mobile robotobstacle avoidancerdquo in Proceedings of the International Confer-ence on Machine Learning and Cybernetics (ICMLC rsquo05) vol 1pp 85ndash89 August 2005

[20] C F Touzet ldquoNeural reinforcement learning for behavioursynthesisrdquo Robotics and Autonomous Systems vol 22 no 3-4pp 251ndash281 1997

[21] W D Smart and L P Kaelbling ldquoEffective reinforcement learn-ing for mobile robotsrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo02) vol 4 pp3404ndash3410 Washington DC USA May 2002

[22] K Macek I Petrovic and N Peric ldquoA reinforcement learningapproach to obstacle avoidance ofmobile robotsrdquo inProceedingsof the 7th International Workshop on Advanced Motion Control(AMC rsquo02) pp 462ndash466 IEEE Maribor Slovenia July 2002

[23] U Muller J Ben E Cosatto et al ldquoOff-road obstacle avoidancethrough end-to-end learningrdquo in Advances in Neural Informa-tion Processing Systems pp 739ndash746 2005

[24] B D Argall S Chernova M Veloso and B Browning ldquoAsurvey of robot learning from demonstrationrdquo Robotics andAutonomous Systems vol 57 no 5 pp 469ndash483 2009

[25] C L Nehaniv and K Dautenhahn ldquoLike me-measures ofcorrespondence and imitationrdquo Cybernetics amp Systems vol 32no 1-2 pp 11ndash51 2001

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 6: Research Article Robot Obstacle Avoidance Learning Based ...Research Article Robot Obstacle Avoidance Learning Based on Mixture Models HuiwenZhang, 1,2 XiaoningHan, 1,2 MingliangFu,

6 Journal of Robotics

0

2

4

6x

(m)

minus1

0

1

120596(r

ads

)

0

2

4

dro

(m)

0

2

4

drg

(m)

100 2000t (s)

100 2000t (s)

100 2000t (s)

100 2000t (s)

(a)

0

2

4

6

x(m

)

50 100 150 2000t (s)

minus1

0

1

120596(r

ads

)

50 100 150 2000t (s)

0

2

4

dro

(m)

50 100 150 2000t (s)

0

2

4

drg

(m)

50 100 150 2000t (s)

(b)

Figure 4 Comparison of original data and processed data (a) shows the original data Lines in red represent cutting position (b) displaysvalid data after processing

To solve these parameters the EM algorithm introducedin the previous section can be used When EM algorithmconverges to a given value we get the right 120587

119896 120583119896 Σ119896 that

is we encode the trajectory data successfullyIn order to get a general barrier-free trajectory which

is given an input which can be temporal signals or otherkinds of sensor data then a barrier-free path is generatedaccording to the given input a natural idea is to divide thevariable 120585 into two parts (120585I

119905 120585

O119905) which represents input and

output respectively and then trajectory retrieval problemcanbe formulized as follows given 120585 sim sum

119870

119894=1120587119894N(120583119894 Σ119894) and 120585

I119905

solve the density probability distribution for 120585O119905 Solution to

this problem is called Gaussian mixture regression Regard-ing 120585

I119905as query points 120585O

119905can be estimated using regression

methods In this paper we use superscriptsI andO standingfor the input and output part for each variable With thisnotation a block decomposition of the data-point 120585

119905 vectors

120583119896 and matrices Σ

119905can be written as

120585119905= [

120585I119896

120585O119896

]

120583119896= [

120583I119896

120583O119896

]

Σ119896= [

ΣI119896

ΣIO119896

ΣOI119896

ΣO119896

]

(11)

Based on (3) for a time step 119905 the mean vector andcovariance matrix for the 119896th component are

O119896(120585

I119905) = 120583

O119896+ Σ

OI119896

ΣIminus1119896

(120585I119905

minus 120583I119896)

ΣO

119896= Σ

O119896minus Σ

OI119896

ΣIminus1119896

ΣIO119896

(12)

Taking 119870 components into consideration the mean vec-tor and covariance matrix for distribution 120585

O119905given 120585

I119905are

119875 (120585O119905| 120585

I119905) sim

119870

sum

119896=1

ℎ119896(120585

I119905)N (

O119896(120585

I119905) Σ

O

119896) (13)

Journal of Robotics 7

where

ℎ119896(120585

I119905) =

120587119894N (120585

I119905

| 120583I119894 Σ

I119894)

sum119870

119896=1120587119896N (120585

I119905

| 120583I119896 Σ

I119896)

(14)

Equation (13) represents a multimodal distributionwhich can be approximated by the following Gaussian dis-tribution

119875 (120585O119905| 120585

I119905) = N (

O119905 Σ

O

119905) with

120583O119905=

119870

sum

119894=1

ℎ119894(120585

I119905) 120583

O119894(120585

I119905)

ΣO

119905=

119870

sum

119894=1

ℎ119894(120585

I119905) (Σ

O

119894+

O119894(120585

I119905)

O119894(120585

I119905)119879

)

minus O119905O119879119905

(15)

Substituting 120585I119905into 120585

O119905using equations described above

a general trajectory is obtained The trajectory not only canavoid obstacles but also can be executed by robots due to itsgood smooth property

44 Finding an Optimum Control Policy Through the previ-ous section we can obtain a general trajectory and this is aplanning problem How to combine planning with controlIn other words how to find a set of control inputs that canmake the robot follow the general trajectory accurately Forthe time-based GMR 120585O

119905is equal to 120585

119904 and 119904 represents

spatial location information Using 119904represents the desired

trajectory and 120585119904represents the actual reached location In

order to evaluate the tracking performance a simple idea isusing the weighted Euclidean distance

119863minus1

sum

119894=1

120596119894(120585119904119894

minus 119904119894)2

= (120585119904minus 119904)119879

119882(120585119904minus 119904) (16)

Naturally the above equation can be applied to mul-tiple variables For this paper we have four variables Let119909

120596

119889ro

119889rg

represent the desired trajectory the actualtracking performance 119875 can be expressed as

119875 = (119909

119904minus 119862119909)119879

119882119909(119909

119904minus 119862119909)

+ (120596

119904minus 119862120596)119879

119882120596(120596

119904minus 119862120596)

+ (119889ro

119904minus 119862119889ro)

119879

119882119889ro(119889ro

119904minus 119862119889ro)

+ (119889rg

119904minus 119862119889rg)

119879

119882119889rg(119889rg

119904minus 119862119889rg)

(17)

where

119862lowast119894119895

= lowast

119904119894119895minus 120585lowast

119904119894119895minus1

forall119894 isin 1 119899 forall119895 isin 1 119879 foralllowast isin 119909 120596 119889ro 119889

rg

(18)

According to performance indicator 119875 finding a desiredcontrol policy is equivalent to maximizing 119875 Since 119875 isa quadratic function an analytical solution can be easilyobtained by gradient method But noting that there existconstraints between variables as shown in (9) maximizing(17) turns into a constrained optimization problem Usuallythis kind of problem can be solved by constructing a Lagrangefunction defined as follows

119871 (119909

119904 120596

119904 119889ro

119904 119889rg

119904 120582) = 119875 + 120582

119879(119909

119904minus 119889rg

119904) (19)

where120582 is Lagrangemultiplier Lettingnabla119871 = 0 solving partialderivatives for

119909

119904 119889rg

119904 we find

2119882119909(119909

119904minus 119862119909) + 120582 = 0

2119882119889rg(119889rg

119904minus 119862119889rg) minus 120582 = 0

(20)

Adding (20) we find

2119882119889rg(119889rg

119904minus 119862119889rg) + 2119882

119909(119909

119904minus 119862119909) = 0 (21)

Substituting 119909

119904= minus119889rg

119904into the above equation we find

119909

119904= (119882

119909minus 119882119889rg)minus1

(119882119909119862119909+ 119882119889rg119862119889rg) (22)

Weight matrices in (22) can be computed by solving theinverse matrix of the current covariance matrix namely

119882 = (ΣO119905)minus1

(23)

5 Simulation Experiments

Experiments are implemented by using a Turtlebot robotIn addition to the robotrsquos own hardware system a Kinectis integrated to obtain distance information The size of themap used in the simulation is 75 times 5m Two different mapsare created to verify the learning effects For the first mapthe obstacle locates in the middle of the map and a smallgap lies in the lower side and the robot can reach the targetplace through the gap The second map is much similar tothe first one except that there exist two obstacles in the mapTo get the destination the robot must avoid every obstacle inthe map Figure 5 shows the shape of the map Figures 5(a)and 5(b) represent the 2D map and Figures 5(c) and 5(d)show a 3D simulation environment which corresponds tothe 2D maps The color maps in maps (a) and (b) stand forcost cloud which provides reference for path planning Thewarm color such as yellow and red stands for a lower costTo the contrary the cool color tune stands for a higher cost Apath planer always tends to choose a trajectory with the leastcostThe cost value is calculated by built-in functions in ROSpackage For the convenience of description we use task 1 andtask 2 to represent the two different obstacle avoidance tasksin maps (a) and (b) It is worth mentioning that we select rel-atively simple environment for both tasks for the reason that

8 Journal of Robotics

StartGoal

x

y

(a)

x

y

(b)

(c) (d)

Figure 5 Twodifferentmaps used in our simulation (a) and (b) are actualmaps generated by software (c) and (d) are simulation environmentcorresponding to maps above them

4 62x (m)

0

1

2

3

4

5

y(m

)

(a)

4 62x (m)

0

1

2

3

4

5

y(m

)

(b)

Figure 6 Demonstrated trajectories in two situations (a) represents sampled data in task 1 (b) represents sampled data in task 2

we putmore emphasis on themechanism behind the obstacleavoidance whichmeans we can learn skills from simple envi-ronment but these skills can be reused in complex situations

Figure 6 shows the demonstrated trajectories in twodifferent situations Coordinates of goal states are set to (6 3)

and (6 2) for task 1 and task 2 respectively The start pointsfor task 1 are randomly generated by sampling from uniformdistributions between 05 and 25 for 119883-coordinates and 2

to 4 for 119884-coordinates The maximum range is 2m whichcan guarantee a good generalization performance for obstacleavoidance For task 2 the start points are generated by sam-pling from uniform distributions from 05 to 2 and 2 to 35

51 Experiments for Trajectory Encoding and Retrieval Fig-ure 7 shows the encoding and retrieval results for task 1 Theencoded variables are location information 119909 and distance

Journal of Robotics 9

GMM

0

1

2

3

4

y(m

)

4 62

x (m)

minus1

0

1

2

3

drg y

(m)

1 2 3 40

drgx (m)

(a)

GMR

0

1

2

3

4

y(m

)

4 62

x (m)

minus1

0

1

2

3

drg y

(m)

2 40

drgx (m)

(b)

Figure 7 Trajectory encoding and retrieval using GMM and GMR for task 1 (a) represents model encoding with GMM Ellipse in dark bluerepresents projections of Gaussian components (b) represents general trajectory generated by GMR Shades in red color represent generationregion

information 119889rg and both variables are 2-dimensional vec-

tors Number of Gaussian components is set to 6 It isworth noting that the optimal number of components 119870

in a model may not be known beforehand so far Acommon method consists of estimating multiple modelswith increasing number of components and selecting anoptimum based on some model selection criterion suchas Cross Validation Bayesian Information Criterion (BIC)or Minimum Description Length (MDL) In our paper wedecide 119870 by using Cross Validation criterion As the figuresshow variance for starting position is large corresponding

to a big retrieval error It indicates that there are no strictrequirements for starting position Variance for end point ispretty small which indicates that the robot can get the goalstate accurately Thus they show that GMM can successfullyextract useful constraints for characterization task GMR canretrieve barrier-free trajectory by using information encodedby GMM as shown in the second column of Figure 5 Figuresin the second row show encoding and retrieval results forvariable 119889

rg As can be seen from the figures 119889rg shares

similar distribution properties with 119909This is because the twovariables have a specific relationship between each other so

10 Journal of Robotics

minus1

0

1120596

(rad

s)

10 200

t (s)

minus1

0

1

120596(r

ads

)

10 200

t (s)

10 200

t (s)

0

5

dro

(m)

10 200

t (s)

0

2

4

dro

(m)

10 200

t (s)

minus5

0

5

drg x

(m)

minus5

0

5

drg x

(m)

10 200

t (s)

minus5

0

5

10 200

t (s)10 200

t (s)

minus5

0

5

drg y

(m)

drg y

(m)

Figure 8 Encoding and retrieval for orientation and distance information in task 1 sharing the same meaning with charts in Figure 7

domean value and covariance which verify the validity of themethod indirectly

Figure 8 shows encoding results for orientation and dis-tance information Figures in the last two rows displayencoding for 119889rg in 119883 and 119884 direction As we see coordinatevalues in119883 directionmonotone decrease to 0This is becausethe robot keeps moving from left to right in the map whilecoordinates in 119884 direction keep going up at the beginningand then drop to zero This is because the robot encountersobstacles in the middle moment in order to avoid obstacles

the robot has to go away from the goal state temporarily Payattention to the final distance value which is close to (0 0)which means the robot reaches the target successfully

Figures 9 and 10 display encoding and retrieval resultsfor task 2 Compared with task 1 task 2 is more complicatedconsidering the law behind the data In order to encode theselaws more Gaussian components are required In our exper-iments 119870 is set to 10 by using the method described in task1 As illustrated in these figures no matter how complex theenvironment is this method still has very satisfactory effect

Journal of Robotics 11

GMM GMR

1

2

3

4

y(m

)

3 4 5 62

x (m)

1

2

3

4

y(m

)

3 4 5 62

x (m)

minus2

minus1

0

1

drg y

(m)

10 3 42

drgx (m)

10 3 42

drgx (m)

minus2

minus1

0

1

drg y

(m)

Figure 9 Trajectory encoding and retrieval using GMM and GMR for variables 119909 and 119889 in task 2

52 Experiments for Finding Control Strategy Experiments inthis part are mainly conducted to verify the derived controlpolicy For a Turtlebot robot the control input is linearvelocity

119909

119904and angular velocity

120579

119904 Assuming we have a

desired trajectory (the trajectory can be given arbitrarily inour experiments in order to find a reference we choose thesecond trajectory in our samples) using theories derived inprevious sections we can compute

119909

119904and

120579

119904 Iterating on

every time we can get a control sequence Comparing thecalculated value with the sensor recorded value we can verifythe validity of the derived results

Figure 12 compares the calculated linear velocity andangular velocity with the desired value Linear velocity isdivided into velocity in 119883 direction and 119884 direction Asthe figure in the second row shows expected velocity in 119884

direction remains close to 0This is because the control inputsfor Turtlebot are a linear velocity in the forward directionthat is 119883 direction and an angular velocity The line graphfor velocity 119884 reveals that there is a minor fluctuation at thebeginning and end stage while the overall trend is consistentwith the fact Velocity 119909 follows the expected value perfectlyConsidering Figure 12(b) it can be seen that the calculatedvalue remains consistent with the actual one It is worth

noting that the length of the calculated sequence differs fromthe actual one since the frequency of the readings fromrobotrsquos sensors is lower than the frequency of commandsIn this case to compare the similarity of the two signalssequence alignment is an indispensable step In this paper weuse dynamic time warping (DTW) to achieve this purposeSignals after alignments show a high correlation whichmeans the calculated values are correct

Figure 11 shows the location information obtained byintegrating velocity signals The calculated control inputsallow the robot to follow the given trajectory accuratelyexcept a minor fluctuation at the beginning time

6 Conclusion

In this paper based on the idea of the imitation learning wepropose a novel idea that the robotrsquos obstacle avoidance skillcan be learned by showing the robot how to avoid an obstacleUnder this consideration task constraints extraction is a keyissue that is how to make robots understand the task Inorder to achieve this purpose the Gaussian mixture model isused to encode the data GMR is used to retrieve satisfactory

12 Journal of Robotics

minus2

0

2120596

(rad

s)

10 20 30 400

t (s)

minus2

0

2

120596(r

ads

)

10 20 30 400

t (s)

0

2

4

dro

(m)

10 20 30 400

t (s)10 20 30 400

t (s)

0

2

4

dro

(m)

minus5

0

5

drg x

(m)

10 20 30 400

t (s)

minus5

0

5

drg x

(m)

10 20 30 400

t (s)

minus5

0

5

10 20 30 400

t (s)10 20 30 400

t (s)

minus5

0

5

drg y

(m)

drg y

(m)

Figure 10 Encoding and retrieval for orientation and distance information in task 2

Expected pathActual path

2 3 4 5 6 71x (m)

05

1

15

2

25

3

35

4

y(m

)

Figure 11 Tracking performance using the derived policy

Journal of Robotics 13

Expected valueCalculated value

50 100 150 2000t (s)

50 100 150 2000t (s)

minus202

Vx

(ms

)

minus101

Vy

(ms

)

(a) Linear velocity

Expected valueCalculated value

50 100 150 2000t (s)

minus202

120596(r

ads

)

(b) Angular velocity

Figure 12 Comparisons between the expected angular velocity andcalculated value

trajectories Eventually to obtain the control strategy a mul-tiobjective optimization problem is constructed The controlstrategy is derived by solving this problem Simulation exper-iments verified our algorithms

For the future improvement work mainly includes thefollowing aspects (i) in order to avoid obstacles onlineonline incremental GMM algorithm should be used (ii) inthis paper a time-based trajectory is generated by GMRthat is use of time signal as input A more reasonable ideais using state information as input so that the robot canchange its decision according to the current environmentfor example using distance information as input and velocityas output thus the control inputs can be directly generated(iii) reinforcement learning algorithms can be integrated tostrengthen robotsrsquo exploration ability so that better general-ization performance is achieved

Competing Interests

The authors declare that the grant scholarship andor fund-ing do not lead to any conflict of interests Additionally theauthors declare that there is no conflict of interests regardingthe publication of this manuscript

Acknowledgments

This work is supported by National Natural Science Founda-tion of China (Grant no 51505470) and the Dr Startup Fundin Liaoning province (20141152)

References

[1] E Masehian and D Sedighizadeh ldquoClassic and heuristicapproaches in robotmotion planningmdasha chronological reviewrdquo

World Academy of Science Engineering and Technology vol 23pp 101ndash106 2007

[2] P Raja and S Pugazhenthi ldquoOptimal path planning of mobilerobots a reviewrdquo International Journal of Physical Sciences vol7 no 9 pp 1314ndash1320 2012

[3] J C Latombe Robot Motion Planning Springer 2012[4] J Borenstein and Y Koren ldquoReal-time obstacle avoidance for

fast mobile robotsrdquo IEEE Transactions on Systems Man andCybernetics vol 19 no 5 pp 1179ndash1187 1989

[5] O Khatib ldquoReal-time obstacle avoidance for manipulators andmobile robotsrdquo International Journal of Robotics Research vol5 no 1 pp 90ndash98 1986

[6] H Mei Y Tian and L Zu ldquoA hybrid ant colony optimizationalgorithm for path planning of robot in dynamic environmentrdquoInternational Journal of Information Technology vol 12 no 3pp 78ndash88 2006

[7] M A P Garcia O Montiel O Castillo R Sepulveda and PMelin ldquoPath planning for autonomous mobile robot navigationwith ant colony optimization and fuzzy cost function evalua-tionrdquo Applied Soft Computing vol 9 no 3 pp 1102ndash1110 2009

[8] C Lopez-Franco J Zepeda N Arana-Daniel and L Lopez-Franco ldquoObstacle avoidance using PSOrdquo in Proceedings of the9th International Conference on Electrical Engineering Comput-ing Science and Automatic Control (CCE rsquo12) pp 1ndash6 MexicoCity Mexico September 2012

[9] H-QMin J-H Zhu andX-J Zheng ldquoObstacle avoidancewithmulti-objective optimization by PSO in dynamic environmentrdquoin Proceedings of the International Conference on MachineLearning and Cybernetics (ICMLC rsquo05) vol 5 pp 2950ndash2956IEEE Guangzhou China August 2005

[10] S Dutta ldquoObstacle avoidance of mobile robot using PSO-basedneuro fuzzy techniquerdquo International Journal of ComputerScience and Engineering vol 2 no 2 pp 301ndash304 2010

[11] W-G Han S-M Baek and T-Y Kuc ldquoGenetic algorithmbased path planning and dynamic obstacle avoidance of mobilerobotsrdquo in Proceedings of the IEEE International Conference onSystems Man and Cybernetics vol 3 pp 2747ndash2751 OrlandoFla USA October 1997

[12] J Tu and S X Yang ldquoGenetic algorithm based path planningfor a mobile robotrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo03) vol 1 pp1221ndash1226 September 2003

[13] Y Zhang and J Wang ldquoObstacle avoidance for kinematicallyredundant manipulators using a dual neural networkrdquo IEEETransactions on Systems Man and Cybernetics Part B Cyber-netics vol 34 no 1 pp 752ndash759 2004

[14] X Wang V Yadav and S N Balakrishnan ldquoCooperativeUAV formation flying with obstaclecollision avoidancerdquo IEEETransactions on Control Systems Technology vol 15 no 4 pp672ndash679 2007

[15] J Godjevac ldquoA learning procedure for a fuzzy system applica-tion to obstacle avoidancerdquo in Proceedings of the InternationalSymposium on Fuzzy Logic pp 142ndash148 Zurich Switzerland1995

[16] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press Boston Mass USA 1998

[17] J Michels A Saxena and A Y Ng ldquoHigh speed obstacleavoidance usingmonocular vision and reinforcement learningrdquoin Proceedings of the 22nd International Conference on MachineLearning (ICML rsquo05) pp 593ndash600 ACMBonnGermany 2005

14 Journal of Robotics

[18] C Ye N H C Yung and D Wang ldquoA fuzzy controller withsupervised learning assisted reinforcement learning algorithmfor obstacle avoidancerdquo IEEETransactions on SystemsMan andCybernetics Part B Cybernetics vol 33 no 1 pp 17ndash27 2003

[19] B-Q Huang G-Y Cao and M Guo ldquoReinforcement learningneural network to the problem of autonomous mobile robotobstacle avoidancerdquo in Proceedings of the International Confer-ence on Machine Learning and Cybernetics (ICMLC rsquo05) vol 1pp 85ndash89 August 2005

[20] C F Touzet ldquoNeural reinforcement learning for behavioursynthesisrdquo Robotics and Autonomous Systems vol 22 no 3-4pp 251ndash281 1997

[21] W D Smart and L P Kaelbling ldquoEffective reinforcement learn-ing for mobile robotsrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo02) vol 4 pp3404ndash3410 Washington DC USA May 2002

[22] K Macek I Petrovic and N Peric ldquoA reinforcement learningapproach to obstacle avoidance ofmobile robotsrdquo inProceedingsof the 7th International Workshop on Advanced Motion Control(AMC rsquo02) pp 462ndash466 IEEE Maribor Slovenia July 2002

[23] U Muller J Ben E Cosatto et al ldquoOff-road obstacle avoidancethrough end-to-end learningrdquo in Advances in Neural Informa-tion Processing Systems pp 739ndash746 2005

[24] B D Argall S Chernova M Veloso and B Browning ldquoAsurvey of robot learning from demonstrationrdquo Robotics andAutonomous Systems vol 57 no 5 pp 469ndash483 2009

[25] C L Nehaniv and K Dautenhahn ldquoLike me-measures ofcorrespondence and imitationrdquo Cybernetics amp Systems vol 32no 1-2 pp 11ndash51 2001

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 7: Research Article Robot Obstacle Avoidance Learning Based ...Research Article Robot Obstacle Avoidance Learning Based on Mixture Models HuiwenZhang, 1,2 XiaoningHan, 1,2 MingliangFu,

Journal of Robotics 7

where

ℎ119896(120585

I119905) =

120587119894N (120585

I119905

| 120583I119894 Σ

I119894)

sum119870

119896=1120587119896N (120585

I119905

| 120583I119896 Σ

I119896)

(14)

Equation (13) represents a multimodal distributionwhich can be approximated by the following Gaussian dis-tribution

119875 (120585O119905| 120585

I119905) = N (

O119905 Σ

O

119905) with

120583O119905=

119870

sum

119894=1

ℎ119894(120585

I119905) 120583

O119894(120585

I119905)

ΣO

119905=

119870

sum

119894=1

ℎ119894(120585

I119905) (Σ

O

119894+

O119894(120585

I119905)

O119894(120585

I119905)119879

)

minus O119905O119879119905

(15)

Substituting 120585I119905into 120585

O119905using equations described above

a general trajectory is obtained The trajectory not only canavoid obstacles but also can be executed by robots due to itsgood smooth property

44 Finding an Optimum Control Policy Through the previ-ous section we can obtain a general trajectory and this is aplanning problem How to combine planning with controlIn other words how to find a set of control inputs that canmake the robot follow the general trajectory accurately Forthe time-based GMR 120585O

119905is equal to 120585

119904 and 119904 represents

spatial location information Using 119904represents the desired

trajectory and 120585119904represents the actual reached location In

order to evaluate the tracking performance a simple idea isusing the weighted Euclidean distance

119863minus1

sum

119894=1

120596119894(120585119904119894

minus 119904119894)2

= (120585119904minus 119904)119879

119882(120585119904minus 119904) (16)

Naturally the above equation can be applied to mul-tiple variables For this paper we have four variables Let119909

120596

119889ro

119889rg

represent the desired trajectory the actualtracking performance 119875 can be expressed as

119875 = (119909

119904minus 119862119909)119879

119882119909(119909

119904minus 119862119909)

+ (120596

119904minus 119862120596)119879

119882120596(120596

119904minus 119862120596)

+ (119889ro

119904minus 119862119889ro)

119879

119882119889ro(119889ro

119904minus 119862119889ro)

+ (119889rg

119904minus 119862119889rg)

119879

119882119889rg(119889rg

119904minus 119862119889rg)

(17)

where

119862lowast119894119895

= lowast

119904119894119895minus 120585lowast

119904119894119895minus1

forall119894 isin 1 119899 forall119895 isin 1 119879 foralllowast isin 119909 120596 119889ro 119889

rg

(18)

According to performance indicator 119875 finding a desiredcontrol policy is equivalent to maximizing 119875 Since 119875 isa quadratic function an analytical solution can be easilyobtained by gradient method But noting that there existconstraints between variables as shown in (9) maximizing(17) turns into a constrained optimization problem Usuallythis kind of problem can be solved by constructing a Lagrangefunction defined as follows

119871 (119909

119904 120596

119904 119889ro

119904 119889rg

119904 120582) = 119875 + 120582

119879(119909

119904minus 119889rg

119904) (19)

where120582 is Lagrangemultiplier Lettingnabla119871 = 0 solving partialderivatives for

119909

119904 119889rg

119904 we find

2119882119909(119909

119904minus 119862119909) + 120582 = 0

2119882119889rg(119889rg

119904minus 119862119889rg) minus 120582 = 0

(20)

Adding (20) we find

2119882119889rg(119889rg

119904minus 119862119889rg) + 2119882

119909(119909

119904minus 119862119909) = 0 (21)

Substituting 119909

119904= minus119889rg

119904into the above equation we find

119909

119904= (119882

119909minus 119882119889rg)minus1

(119882119909119862119909+ 119882119889rg119862119889rg) (22)

Weight matrices in (22) can be computed by solving theinverse matrix of the current covariance matrix namely

119882 = (ΣO119905)minus1

(23)

5 Simulation Experiments

Experiments are implemented by using a Turtlebot robotIn addition to the robotrsquos own hardware system a Kinectis integrated to obtain distance information The size of themap used in the simulation is 75 times 5m Two different mapsare created to verify the learning effects For the first mapthe obstacle locates in the middle of the map and a smallgap lies in the lower side and the robot can reach the targetplace through the gap The second map is much similar tothe first one except that there exist two obstacles in the mapTo get the destination the robot must avoid every obstacle inthe map Figure 5 shows the shape of the map Figures 5(a)and 5(b) represent the 2D map and Figures 5(c) and 5(d)show a 3D simulation environment which corresponds tothe 2D maps The color maps in maps (a) and (b) stand forcost cloud which provides reference for path planning Thewarm color such as yellow and red stands for a lower costTo the contrary the cool color tune stands for a higher cost Apath planer always tends to choose a trajectory with the leastcostThe cost value is calculated by built-in functions in ROSpackage For the convenience of description we use task 1 andtask 2 to represent the two different obstacle avoidance tasksin maps (a) and (b) It is worth mentioning that we select rel-atively simple environment for both tasks for the reason that

8 Journal of Robotics

StartGoal

x

y

(a)

x

y

(b)

(c) (d)

Figure 5 Twodifferentmaps used in our simulation (a) and (b) are actualmaps generated by software (c) and (d) are simulation environmentcorresponding to maps above them

4 62x (m)

0

1

2

3

4

5

y(m

)

(a)

4 62x (m)

0

1

2

3

4

5

y(m

)

(b)

Figure 6 Demonstrated trajectories in two situations (a) represents sampled data in task 1 (b) represents sampled data in task 2

we putmore emphasis on themechanism behind the obstacleavoidance whichmeans we can learn skills from simple envi-ronment but these skills can be reused in complex situations

Figure 6 shows the demonstrated trajectories in twodifferent situations Coordinates of goal states are set to (6 3)

and (6 2) for task 1 and task 2 respectively The start pointsfor task 1 are randomly generated by sampling from uniformdistributions between 05 and 25 for 119883-coordinates and 2

to 4 for 119884-coordinates The maximum range is 2m whichcan guarantee a good generalization performance for obstacleavoidance For task 2 the start points are generated by sam-pling from uniform distributions from 05 to 2 and 2 to 35

51 Experiments for Trajectory Encoding and Retrieval Fig-ure 7 shows the encoding and retrieval results for task 1 Theencoded variables are location information 119909 and distance

Journal of Robotics 9

GMM

0

1

2

3

4

y(m

)

4 62

x (m)

minus1

0

1

2

3

drg y

(m)

1 2 3 40

drgx (m)

(a)

GMR

0

1

2

3

4

y(m

)

4 62

x (m)

minus1

0

1

2

3

drg y

(m)

2 40

drgx (m)

(b)

Figure 7 Trajectory encoding and retrieval using GMM and GMR for task 1 (a) represents model encoding with GMM Ellipse in dark bluerepresents projections of Gaussian components (b) represents general trajectory generated by GMR Shades in red color represent generationregion

information 119889rg and both variables are 2-dimensional vec-

tors Number of Gaussian components is set to 6 It isworth noting that the optimal number of components 119870

in a model may not be known beforehand so far Acommon method consists of estimating multiple modelswith increasing number of components and selecting anoptimum based on some model selection criterion suchas Cross Validation Bayesian Information Criterion (BIC)or Minimum Description Length (MDL) In our paper wedecide 119870 by using Cross Validation criterion As the figuresshow variance for starting position is large corresponding

to a big retrieval error It indicates that there are no strictrequirements for starting position Variance for end point ispretty small which indicates that the robot can get the goalstate accurately Thus they show that GMM can successfullyextract useful constraints for characterization task GMR canretrieve barrier-free trajectory by using information encodedby GMM as shown in the second column of Figure 5 Figuresin the second row show encoding and retrieval results forvariable 119889

rg As can be seen from the figures 119889rg shares

similar distribution properties with 119909This is because the twovariables have a specific relationship between each other so

10 Journal of Robotics

minus1

0

1120596

(rad

s)

10 200

t (s)

minus1

0

1

120596(r

ads

)

10 200

t (s)

10 200

t (s)

0

5

dro

(m)

10 200

t (s)

0

2

4

dro

(m)

10 200

t (s)

minus5

0

5

drg x

(m)

minus5

0

5

drg x

(m)

10 200

t (s)

minus5

0

5

10 200

t (s)10 200

t (s)

minus5

0

5

drg y

(m)

drg y

(m)

Figure 8 Encoding and retrieval for orientation and distance information in task 1 sharing the same meaning with charts in Figure 7

domean value and covariance which verify the validity of themethod indirectly

Figure 8 shows encoding results for orientation and dis-tance information Figures in the last two rows displayencoding for 119889rg in 119883 and 119884 direction As we see coordinatevalues in119883 directionmonotone decrease to 0This is becausethe robot keeps moving from left to right in the map whilecoordinates in 119884 direction keep going up at the beginningand then drop to zero This is because the robot encountersobstacles in the middle moment in order to avoid obstacles

the robot has to go away from the goal state temporarily Payattention to the final distance value which is close to (0 0)which means the robot reaches the target successfully

Figures 9 and 10 display encoding and retrieval resultsfor task 2 Compared with task 1 task 2 is more complicatedconsidering the law behind the data In order to encode theselaws more Gaussian components are required In our exper-iments 119870 is set to 10 by using the method described in task1 As illustrated in these figures no matter how complex theenvironment is this method still has very satisfactory effect

Journal of Robotics 11

GMM GMR

1

2

3

4

y(m

)

3 4 5 62

x (m)

1

2

3

4

y(m

)

3 4 5 62

x (m)

minus2

minus1

0

1

drg y

(m)

10 3 42

drgx (m)

10 3 42

drgx (m)

minus2

minus1

0

1

drg y

(m)

Figure 9 Trajectory encoding and retrieval using GMM and GMR for variables 119909 and 119889 in task 2

52 Experiments for Finding Control Strategy Experiments inthis part are mainly conducted to verify the derived controlpolicy For a Turtlebot robot the control input is linearvelocity

119909

119904and angular velocity

120579

119904 Assuming we have a

desired trajectory (the trajectory can be given arbitrarily inour experiments in order to find a reference we choose thesecond trajectory in our samples) using theories derived inprevious sections we can compute

119909

119904and

120579

119904 Iterating on

every time we can get a control sequence Comparing thecalculated value with the sensor recorded value we can verifythe validity of the derived results

Figure 12 compares the calculated linear velocity andangular velocity with the desired value Linear velocity isdivided into velocity in 119883 direction and 119884 direction Asthe figure in the second row shows expected velocity in 119884

direction remains close to 0This is because the control inputsfor Turtlebot are a linear velocity in the forward directionthat is 119883 direction and an angular velocity The line graphfor velocity 119884 reveals that there is a minor fluctuation at thebeginning and end stage while the overall trend is consistentwith the fact Velocity 119909 follows the expected value perfectlyConsidering Figure 12(b) it can be seen that the calculatedvalue remains consistent with the actual one It is worth

noting that the length of the calculated sequence differs fromthe actual one since the frequency of the readings fromrobotrsquos sensors is lower than the frequency of commandsIn this case to compare the similarity of the two signalssequence alignment is an indispensable step In this paper weuse dynamic time warping (DTW) to achieve this purposeSignals after alignments show a high correlation whichmeans the calculated values are correct

Figure 11 shows the location information obtained byintegrating velocity signals The calculated control inputsallow the robot to follow the given trajectory accuratelyexcept a minor fluctuation at the beginning time

6 Conclusion

In this paper based on the idea of the imitation learning wepropose a novel idea that the robotrsquos obstacle avoidance skillcan be learned by showing the robot how to avoid an obstacleUnder this consideration task constraints extraction is a keyissue that is how to make robots understand the task Inorder to achieve this purpose the Gaussian mixture model isused to encode the data GMR is used to retrieve satisfactory

12 Journal of Robotics

minus2

0

2120596

(rad

s)

10 20 30 400

t (s)

minus2

0

2

120596(r

ads

)

10 20 30 400

t (s)

0

2

4

dro

(m)

10 20 30 400

t (s)10 20 30 400

t (s)

0

2

4

dro

(m)

minus5

0

5

drg x

(m)

10 20 30 400

t (s)

minus5

0

5

drg x

(m)

10 20 30 400

t (s)

minus5

0

5

10 20 30 400

t (s)10 20 30 400

t (s)

minus5

0

5

drg y

(m)

drg y

(m)

Figure 10 Encoding and retrieval for orientation and distance information in task 2

Expected pathActual path

2 3 4 5 6 71x (m)

05

1

15

2

25

3

35

4

y(m

)

Figure 11 Tracking performance using the derived policy

Journal of Robotics 13

Expected valueCalculated value

50 100 150 2000t (s)

50 100 150 2000t (s)

minus202

Vx

(ms

)

minus101

Vy

(ms

)

(a) Linear velocity

Expected valueCalculated value

50 100 150 2000t (s)

minus202

120596(r

ads

)

(b) Angular velocity

Figure 12 Comparisons between the expected angular velocity andcalculated value

trajectories Eventually to obtain the control strategy a mul-tiobjective optimization problem is constructed The controlstrategy is derived by solving this problem Simulation exper-iments verified our algorithms

For the future improvement work mainly includes thefollowing aspects (i) in order to avoid obstacles onlineonline incremental GMM algorithm should be used (ii) inthis paper a time-based trajectory is generated by GMRthat is use of time signal as input A more reasonable ideais using state information as input so that the robot canchange its decision according to the current environmentfor example using distance information as input and velocityas output thus the control inputs can be directly generated(iii) reinforcement learning algorithms can be integrated tostrengthen robotsrsquo exploration ability so that better general-ization performance is achieved

Competing Interests

The authors declare that the grant scholarship andor fund-ing do not lead to any conflict of interests Additionally theauthors declare that there is no conflict of interests regardingthe publication of this manuscript

Acknowledgments

This work is supported by National Natural Science Founda-tion of China (Grant no 51505470) and the Dr Startup Fundin Liaoning province (20141152)

References

[1] E Masehian and D Sedighizadeh ldquoClassic and heuristicapproaches in robotmotion planningmdasha chronological reviewrdquo

World Academy of Science Engineering and Technology vol 23pp 101ndash106 2007

[2] P Raja and S Pugazhenthi ldquoOptimal path planning of mobilerobots a reviewrdquo International Journal of Physical Sciences vol7 no 9 pp 1314ndash1320 2012

[3] J C Latombe Robot Motion Planning Springer 2012[4] J Borenstein and Y Koren ldquoReal-time obstacle avoidance for

fast mobile robotsrdquo IEEE Transactions on Systems Man andCybernetics vol 19 no 5 pp 1179ndash1187 1989

[5] O Khatib ldquoReal-time obstacle avoidance for manipulators andmobile robotsrdquo International Journal of Robotics Research vol5 no 1 pp 90ndash98 1986

[6] H Mei Y Tian and L Zu ldquoA hybrid ant colony optimizationalgorithm for path planning of robot in dynamic environmentrdquoInternational Journal of Information Technology vol 12 no 3pp 78ndash88 2006

[7] M A P Garcia O Montiel O Castillo R Sepulveda and PMelin ldquoPath planning for autonomous mobile robot navigationwith ant colony optimization and fuzzy cost function evalua-tionrdquo Applied Soft Computing vol 9 no 3 pp 1102ndash1110 2009

[8] C Lopez-Franco J Zepeda N Arana-Daniel and L Lopez-Franco ldquoObstacle avoidance using PSOrdquo in Proceedings of the9th International Conference on Electrical Engineering Comput-ing Science and Automatic Control (CCE rsquo12) pp 1ndash6 MexicoCity Mexico September 2012

[9] H-QMin J-H Zhu andX-J Zheng ldquoObstacle avoidancewithmulti-objective optimization by PSO in dynamic environmentrdquoin Proceedings of the International Conference on MachineLearning and Cybernetics (ICMLC rsquo05) vol 5 pp 2950ndash2956IEEE Guangzhou China August 2005

[10] S Dutta ldquoObstacle avoidance of mobile robot using PSO-basedneuro fuzzy techniquerdquo International Journal of ComputerScience and Engineering vol 2 no 2 pp 301ndash304 2010

[11] W-G Han S-M Baek and T-Y Kuc ldquoGenetic algorithmbased path planning and dynamic obstacle avoidance of mobilerobotsrdquo in Proceedings of the IEEE International Conference onSystems Man and Cybernetics vol 3 pp 2747ndash2751 OrlandoFla USA October 1997

[12] J Tu and S X Yang ldquoGenetic algorithm based path planningfor a mobile robotrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo03) vol 1 pp1221ndash1226 September 2003

[13] Y Zhang and J Wang ldquoObstacle avoidance for kinematicallyredundant manipulators using a dual neural networkrdquo IEEETransactions on Systems Man and Cybernetics Part B Cyber-netics vol 34 no 1 pp 752ndash759 2004

[14] X Wang V Yadav and S N Balakrishnan ldquoCooperativeUAV formation flying with obstaclecollision avoidancerdquo IEEETransactions on Control Systems Technology vol 15 no 4 pp672ndash679 2007

[15] J Godjevac ldquoA learning procedure for a fuzzy system applica-tion to obstacle avoidancerdquo in Proceedings of the InternationalSymposium on Fuzzy Logic pp 142ndash148 Zurich Switzerland1995

[16] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press Boston Mass USA 1998

[17] J Michels A Saxena and A Y Ng ldquoHigh speed obstacleavoidance usingmonocular vision and reinforcement learningrdquoin Proceedings of the 22nd International Conference on MachineLearning (ICML rsquo05) pp 593ndash600 ACMBonnGermany 2005

14 Journal of Robotics

[18] C Ye N H C Yung and D Wang ldquoA fuzzy controller withsupervised learning assisted reinforcement learning algorithmfor obstacle avoidancerdquo IEEETransactions on SystemsMan andCybernetics Part B Cybernetics vol 33 no 1 pp 17ndash27 2003

[19] B-Q Huang G-Y Cao and M Guo ldquoReinforcement learningneural network to the problem of autonomous mobile robotobstacle avoidancerdquo in Proceedings of the International Confer-ence on Machine Learning and Cybernetics (ICMLC rsquo05) vol 1pp 85ndash89 August 2005

[20] C F Touzet ldquoNeural reinforcement learning for behavioursynthesisrdquo Robotics and Autonomous Systems vol 22 no 3-4pp 251ndash281 1997

[21] W D Smart and L P Kaelbling ldquoEffective reinforcement learn-ing for mobile robotsrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo02) vol 4 pp3404ndash3410 Washington DC USA May 2002

[22] K Macek I Petrovic and N Peric ldquoA reinforcement learningapproach to obstacle avoidance ofmobile robotsrdquo inProceedingsof the 7th International Workshop on Advanced Motion Control(AMC rsquo02) pp 462ndash466 IEEE Maribor Slovenia July 2002

[23] U Muller J Ben E Cosatto et al ldquoOff-road obstacle avoidancethrough end-to-end learningrdquo in Advances in Neural Informa-tion Processing Systems pp 739ndash746 2005

[24] B D Argall S Chernova M Veloso and B Browning ldquoAsurvey of robot learning from demonstrationrdquo Robotics andAutonomous Systems vol 57 no 5 pp 469ndash483 2009

[25] C L Nehaniv and K Dautenhahn ldquoLike me-measures ofcorrespondence and imitationrdquo Cybernetics amp Systems vol 32no 1-2 pp 11ndash51 2001

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 8: Research Article Robot Obstacle Avoidance Learning Based ...Research Article Robot Obstacle Avoidance Learning Based on Mixture Models HuiwenZhang, 1,2 XiaoningHan, 1,2 MingliangFu,

8 Journal of Robotics

StartGoal

x

y

(a)

x

y

(b)

(c) (d)

Figure 5 Twodifferentmaps used in our simulation (a) and (b) are actualmaps generated by software (c) and (d) are simulation environmentcorresponding to maps above them

4 62x (m)

0

1

2

3

4

5

y(m

)

(a)

4 62x (m)

0

1

2

3

4

5

y(m

)

(b)

Figure 6 Demonstrated trajectories in two situations (a) represents sampled data in task 1 (b) represents sampled data in task 2

we putmore emphasis on themechanism behind the obstacleavoidance whichmeans we can learn skills from simple envi-ronment but these skills can be reused in complex situations

Figure 6 shows the demonstrated trajectories in twodifferent situations Coordinates of goal states are set to (6 3)

and (6 2) for task 1 and task 2 respectively The start pointsfor task 1 are randomly generated by sampling from uniformdistributions between 05 and 25 for 119883-coordinates and 2

to 4 for 119884-coordinates The maximum range is 2m whichcan guarantee a good generalization performance for obstacleavoidance For task 2 the start points are generated by sam-pling from uniform distributions from 05 to 2 and 2 to 35

51 Experiments for Trajectory Encoding and Retrieval Fig-ure 7 shows the encoding and retrieval results for task 1 Theencoded variables are location information 119909 and distance

Journal of Robotics 9

GMM

0

1

2

3

4

y(m

)

4 62

x (m)

minus1

0

1

2

3

drg y

(m)

1 2 3 40

drgx (m)

(a)

GMR

0

1

2

3

4

y(m

)

4 62

x (m)

minus1

0

1

2

3

drg y

(m)

2 40

drgx (m)

(b)

Figure 7 Trajectory encoding and retrieval using GMM and GMR for task 1 (a) represents model encoding with GMM Ellipse in dark bluerepresents projections of Gaussian components (b) represents general trajectory generated by GMR Shades in red color represent generationregion

information 119889rg and both variables are 2-dimensional vec-

tors Number of Gaussian components is set to 6 It isworth noting that the optimal number of components 119870

in a model may not be known beforehand so far Acommon method consists of estimating multiple modelswith increasing number of components and selecting anoptimum based on some model selection criterion suchas Cross Validation Bayesian Information Criterion (BIC)or Minimum Description Length (MDL) In our paper wedecide 119870 by using Cross Validation criterion As the figuresshow variance for starting position is large corresponding

to a big retrieval error It indicates that there are no strictrequirements for starting position Variance for end point ispretty small which indicates that the robot can get the goalstate accurately Thus they show that GMM can successfullyextract useful constraints for characterization task GMR canretrieve barrier-free trajectory by using information encodedby GMM as shown in the second column of Figure 5 Figuresin the second row show encoding and retrieval results forvariable 119889

rg As can be seen from the figures 119889rg shares

similar distribution properties with 119909This is because the twovariables have a specific relationship between each other so

10 Journal of Robotics

minus1

0

1120596

(rad

s)

10 200

t (s)

minus1

0

1

120596(r

ads

)

10 200

t (s)

10 200

t (s)

0

5

dro

(m)

10 200

t (s)

0

2

4

dro

(m)

10 200

t (s)

minus5

0

5

drg x

(m)

minus5

0

5

drg x

(m)

10 200

t (s)

minus5

0

5

10 200

t (s)10 200

t (s)

minus5

0

5

drg y

(m)

drg y

(m)

Figure 8 Encoding and retrieval for orientation and distance information in task 1 sharing the same meaning with charts in Figure 7

domean value and covariance which verify the validity of themethod indirectly

Figure 8 shows encoding results for orientation and dis-tance information Figures in the last two rows displayencoding for 119889rg in 119883 and 119884 direction As we see coordinatevalues in119883 directionmonotone decrease to 0This is becausethe robot keeps moving from left to right in the map whilecoordinates in 119884 direction keep going up at the beginningand then drop to zero This is because the robot encountersobstacles in the middle moment in order to avoid obstacles

the robot has to go away from the goal state temporarily Payattention to the final distance value which is close to (0 0)which means the robot reaches the target successfully

Figures 9 and 10 display encoding and retrieval resultsfor task 2 Compared with task 1 task 2 is more complicatedconsidering the law behind the data In order to encode theselaws more Gaussian components are required In our exper-iments 119870 is set to 10 by using the method described in task1 As illustrated in these figures no matter how complex theenvironment is this method still has very satisfactory effect

Journal of Robotics 11

GMM GMR

1

2

3

4

y(m

)

3 4 5 62

x (m)

1

2

3

4

y(m

)

3 4 5 62

x (m)

minus2

minus1

0

1

drg y

(m)

10 3 42

drgx (m)

10 3 42

drgx (m)

minus2

minus1

0

1

drg y

(m)

Figure 9 Trajectory encoding and retrieval using GMM and GMR for variables 119909 and 119889 in task 2

52 Experiments for Finding Control Strategy Experiments inthis part are mainly conducted to verify the derived controlpolicy For a Turtlebot robot the control input is linearvelocity

119909

119904and angular velocity

120579

119904 Assuming we have a

desired trajectory (the trajectory can be given arbitrarily inour experiments in order to find a reference we choose thesecond trajectory in our samples) using theories derived inprevious sections we can compute

119909

119904and

120579

119904 Iterating on

every time we can get a control sequence Comparing thecalculated value with the sensor recorded value we can verifythe validity of the derived results

Figure 12 compares the calculated linear velocity andangular velocity with the desired value Linear velocity isdivided into velocity in 119883 direction and 119884 direction Asthe figure in the second row shows expected velocity in 119884

direction remains close to 0This is because the control inputsfor Turtlebot are a linear velocity in the forward directionthat is 119883 direction and an angular velocity The line graphfor velocity 119884 reveals that there is a minor fluctuation at thebeginning and end stage while the overall trend is consistentwith the fact Velocity 119909 follows the expected value perfectlyConsidering Figure 12(b) it can be seen that the calculatedvalue remains consistent with the actual one It is worth

noting that the length of the calculated sequence differs fromthe actual one since the frequency of the readings fromrobotrsquos sensors is lower than the frequency of commandsIn this case to compare the similarity of the two signalssequence alignment is an indispensable step In this paper weuse dynamic time warping (DTW) to achieve this purposeSignals after alignments show a high correlation whichmeans the calculated values are correct

Figure 11 shows the location information obtained byintegrating velocity signals The calculated control inputsallow the robot to follow the given trajectory accuratelyexcept a minor fluctuation at the beginning time

6 Conclusion

In this paper based on the idea of the imitation learning wepropose a novel idea that the robotrsquos obstacle avoidance skillcan be learned by showing the robot how to avoid an obstacleUnder this consideration task constraints extraction is a keyissue that is how to make robots understand the task Inorder to achieve this purpose the Gaussian mixture model isused to encode the data GMR is used to retrieve satisfactory

12 Journal of Robotics

minus2

0

2120596

(rad

s)

10 20 30 400

t (s)

minus2

0

2

120596(r

ads

)

10 20 30 400

t (s)

0

2

4

dro

(m)

10 20 30 400

t (s)10 20 30 400

t (s)

0

2

4

dro

(m)

minus5

0

5

drg x

(m)

10 20 30 400

t (s)

minus5

0

5

drg x

(m)

10 20 30 400

t (s)

minus5

0

5

10 20 30 400

t (s)10 20 30 400

t (s)

minus5

0

5

drg y

(m)

drg y

(m)

Figure 10 Encoding and retrieval for orientation and distance information in task 2

Expected pathActual path

2 3 4 5 6 71x (m)

05

1

15

2

25

3

35

4

y(m

)

Figure 11 Tracking performance using the derived policy

Journal of Robotics 13

Expected valueCalculated value

50 100 150 2000t (s)

50 100 150 2000t (s)

minus202

Vx

(ms

)

minus101

Vy

(ms

)

(a) Linear velocity

Expected valueCalculated value

50 100 150 2000t (s)

minus202

120596(r

ads

)

(b) Angular velocity

Figure 12 Comparisons between the expected angular velocity andcalculated value

trajectories Eventually to obtain the control strategy a mul-tiobjective optimization problem is constructed The controlstrategy is derived by solving this problem Simulation exper-iments verified our algorithms

For the future improvement work mainly includes thefollowing aspects (i) in order to avoid obstacles onlineonline incremental GMM algorithm should be used (ii) inthis paper a time-based trajectory is generated by GMRthat is use of time signal as input A more reasonable ideais using state information as input so that the robot canchange its decision according to the current environmentfor example using distance information as input and velocityas output thus the control inputs can be directly generated(iii) reinforcement learning algorithms can be integrated tostrengthen robotsrsquo exploration ability so that better general-ization performance is achieved

Competing Interests

The authors declare that the grant scholarship andor fund-ing do not lead to any conflict of interests Additionally theauthors declare that there is no conflict of interests regardingthe publication of this manuscript

Acknowledgments

This work is supported by National Natural Science Founda-tion of China (Grant no 51505470) and the Dr Startup Fundin Liaoning province (20141152)

References

[1] E Masehian and D Sedighizadeh ldquoClassic and heuristicapproaches in robotmotion planningmdasha chronological reviewrdquo

World Academy of Science Engineering and Technology vol 23pp 101ndash106 2007

[2] P Raja and S Pugazhenthi ldquoOptimal path planning of mobilerobots a reviewrdquo International Journal of Physical Sciences vol7 no 9 pp 1314ndash1320 2012

[3] J C Latombe Robot Motion Planning Springer 2012[4] J Borenstein and Y Koren ldquoReal-time obstacle avoidance for

fast mobile robotsrdquo IEEE Transactions on Systems Man andCybernetics vol 19 no 5 pp 1179ndash1187 1989

[5] O Khatib ldquoReal-time obstacle avoidance for manipulators andmobile robotsrdquo International Journal of Robotics Research vol5 no 1 pp 90ndash98 1986

[6] H Mei Y Tian and L Zu ldquoA hybrid ant colony optimizationalgorithm for path planning of robot in dynamic environmentrdquoInternational Journal of Information Technology vol 12 no 3pp 78ndash88 2006

[7] M A P Garcia O Montiel O Castillo R Sepulveda and PMelin ldquoPath planning for autonomous mobile robot navigationwith ant colony optimization and fuzzy cost function evalua-tionrdquo Applied Soft Computing vol 9 no 3 pp 1102ndash1110 2009

[8] C Lopez-Franco J Zepeda N Arana-Daniel and L Lopez-Franco ldquoObstacle avoidance using PSOrdquo in Proceedings of the9th International Conference on Electrical Engineering Comput-ing Science and Automatic Control (CCE rsquo12) pp 1ndash6 MexicoCity Mexico September 2012

[9] H-QMin J-H Zhu andX-J Zheng ldquoObstacle avoidancewithmulti-objective optimization by PSO in dynamic environmentrdquoin Proceedings of the International Conference on MachineLearning and Cybernetics (ICMLC rsquo05) vol 5 pp 2950ndash2956IEEE Guangzhou China August 2005

[10] S Dutta ldquoObstacle avoidance of mobile robot using PSO-basedneuro fuzzy techniquerdquo International Journal of ComputerScience and Engineering vol 2 no 2 pp 301ndash304 2010

[11] W-G Han S-M Baek and T-Y Kuc ldquoGenetic algorithmbased path planning and dynamic obstacle avoidance of mobilerobotsrdquo in Proceedings of the IEEE International Conference onSystems Man and Cybernetics vol 3 pp 2747ndash2751 OrlandoFla USA October 1997

[12] J Tu and S X Yang ldquoGenetic algorithm based path planningfor a mobile robotrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo03) vol 1 pp1221ndash1226 September 2003

[13] Y Zhang and J Wang ldquoObstacle avoidance for kinematicallyredundant manipulators using a dual neural networkrdquo IEEETransactions on Systems Man and Cybernetics Part B Cyber-netics vol 34 no 1 pp 752ndash759 2004

[14] X Wang V Yadav and S N Balakrishnan ldquoCooperativeUAV formation flying with obstaclecollision avoidancerdquo IEEETransactions on Control Systems Technology vol 15 no 4 pp672ndash679 2007

[15] J Godjevac ldquoA learning procedure for a fuzzy system applica-tion to obstacle avoidancerdquo in Proceedings of the InternationalSymposium on Fuzzy Logic pp 142ndash148 Zurich Switzerland1995

[16] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press Boston Mass USA 1998

[17] J Michels A Saxena and A Y Ng ldquoHigh speed obstacleavoidance usingmonocular vision and reinforcement learningrdquoin Proceedings of the 22nd International Conference on MachineLearning (ICML rsquo05) pp 593ndash600 ACMBonnGermany 2005

14 Journal of Robotics

[18] C Ye N H C Yung and D Wang ldquoA fuzzy controller withsupervised learning assisted reinforcement learning algorithmfor obstacle avoidancerdquo IEEETransactions on SystemsMan andCybernetics Part B Cybernetics vol 33 no 1 pp 17ndash27 2003

[19] B-Q Huang G-Y Cao and M Guo ldquoReinforcement learningneural network to the problem of autonomous mobile robotobstacle avoidancerdquo in Proceedings of the International Confer-ence on Machine Learning and Cybernetics (ICMLC rsquo05) vol 1pp 85ndash89 August 2005

[20] C F Touzet ldquoNeural reinforcement learning for behavioursynthesisrdquo Robotics and Autonomous Systems vol 22 no 3-4pp 251ndash281 1997

[21] W D Smart and L P Kaelbling ldquoEffective reinforcement learn-ing for mobile robotsrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo02) vol 4 pp3404ndash3410 Washington DC USA May 2002

[22] K Macek I Petrovic and N Peric ldquoA reinforcement learningapproach to obstacle avoidance ofmobile robotsrdquo inProceedingsof the 7th International Workshop on Advanced Motion Control(AMC rsquo02) pp 462ndash466 IEEE Maribor Slovenia July 2002

[23] U Muller J Ben E Cosatto et al ldquoOff-road obstacle avoidancethrough end-to-end learningrdquo in Advances in Neural Informa-tion Processing Systems pp 739ndash746 2005

[24] B D Argall S Chernova M Veloso and B Browning ldquoAsurvey of robot learning from demonstrationrdquo Robotics andAutonomous Systems vol 57 no 5 pp 469ndash483 2009

[25] C L Nehaniv and K Dautenhahn ldquoLike me-measures ofcorrespondence and imitationrdquo Cybernetics amp Systems vol 32no 1-2 pp 11ndash51 2001

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 9: Research Article Robot Obstacle Avoidance Learning Based ...Research Article Robot Obstacle Avoidance Learning Based on Mixture Models HuiwenZhang, 1,2 XiaoningHan, 1,2 MingliangFu,

Journal of Robotics 9

GMM

0

1

2

3

4

y(m

)

4 62

x (m)

minus1

0

1

2

3

drg y

(m)

1 2 3 40

drgx (m)

(a)

GMR

0

1

2

3

4

y(m

)

4 62

x (m)

minus1

0

1

2

3

drg y

(m)

2 40

drgx (m)

(b)

Figure 7 Trajectory encoding and retrieval using GMM and GMR for task 1 (a) represents model encoding with GMM Ellipse in dark bluerepresents projections of Gaussian components (b) represents general trajectory generated by GMR Shades in red color represent generationregion

information 119889rg and both variables are 2-dimensional vec-

tors Number of Gaussian components is set to 6 It isworth noting that the optimal number of components 119870

in a model may not be known beforehand so far Acommon method consists of estimating multiple modelswith increasing number of components and selecting anoptimum based on some model selection criterion suchas Cross Validation Bayesian Information Criterion (BIC)or Minimum Description Length (MDL) In our paper wedecide 119870 by using Cross Validation criterion As the figuresshow variance for starting position is large corresponding

to a big retrieval error It indicates that there are no strictrequirements for starting position Variance for end point ispretty small which indicates that the robot can get the goalstate accurately Thus they show that GMM can successfullyextract useful constraints for characterization task GMR canretrieve barrier-free trajectory by using information encodedby GMM as shown in the second column of Figure 5 Figuresin the second row show encoding and retrieval results forvariable 119889

rg As can be seen from the figures 119889rg shares

similar distribution properties with 119909This is because the twovariables have a specific relationship between each other so

10 Journal of Robotics

minus1

0

1120596

(rad

s)

10 200

t (s)

minus1

0

1

120596(r

ads

)

10 200

t (s)

10 200

t (s)

0

5

dro

(m)

10 200

t (s)

0

2

4

dro

(m)

10 200

t (s)

minus5

0

5

drg x

(m)

minus5

0

5

drg x

(m)

10 200

t (s)

minus5

0

5

10 200

t (s)10 200

t (s)

minus5

0

5

drg y

(m)

drg y

(m)

Figure 8 Encoding and retrieval for orientation and distance information in task 1 sharing the same meaning with charts in Figure 7

domean value and covariance which verify the validity of themethod indirectly

Figure 8 shows encoding results for orientation and dis-tance information Figures in the last two rows displayencoding for 119889rg in 119883 and 119884 direction As we see coordinatevalues in119883 directionmonotone decrease to 0This is becausethe robot keeps moving from left to right in the map whilecoordinates in 119884 direction keep going up at the beginningand then drop to zero This is because the robot encountersobstacles in the middle moment in order to avoid obstacles

the robot has to go away from the goal state temporarily Payattention to the final distance value which is close to (0 0)which means the robot reaches the target successfully

Figures 9 and 10 display encoding and retrieval resultsfor task 2 Compared with task 1 task 2 is more complicatedconsidering the law behind the data In order to encode theselaws more Gaussian components are required In our exper-iments 119870 is set to 10 by using the method described in task1 As illustrated in these figures no matter how complex theenvironment is this method still has very satisfactory effect

Journal of Robotics 11

GMM GMR

1

2

3

4

y(m

)

3 4 5 62

x (m)

1

2

3

4

y(m

)

3 4 5 62

x (m)

minus2

minus1

0

1

drg y

(m)

10 3 42

drgx (m)

10 3 42

drgx (m)

minus2

minus1

0

1

drg y

(m)

Figure 9 Trajectory encoding and retrieval using GMM and GMR for variables 119909 and 119889 in task 2

52 Experiments for Finding Control Strategy Experiments inthis part are mainly conducted to verify the derived controlpolicy For a Turtlebot robot the control input is linearvelocity

119909

119904and angular velocity

120579

119904 Assuming we have a

desired trajectory (the trajectory can be given arbitrarily inour experiments in order to find a reference we choose thesecond trajectory in our samples) using theories derived inprevious sections we can compute

119909

119904and

120579

119904 Iterating on

every time we can get a control sequence Comparing thecalculated value with the sensor recorded value we can verifythe validity of the derived results

Figure 12 compares the calculated linear velocity andangular velocity with the desired value Linear velocity isdivided into velocity in 119883 direction and 119884 direction Asthe figure in the second row shows expected velocity in 119884

direction remains close to 0This is because the control inputsfor Turtlebot are a linear velocity in the forward directionthat is 119883 direction and an angular velocity The line graphfor velocity 119884 reveals that there is a minor fluctuation at thebeginning and end stage while the overall trend is consistentwith the fact Velocity 119909 follows the expected value perfectlyConsidering Figure 12(b) it can be seen that the calculatedvalue remains consistent with the actual one It is worth

noting that the length of the calculated sequence differs fromthe actual one since the frequency of the readings fromrobotrsquos sensors is lower than the frequency of commandsIn this case to compare the similarity of the two signalssequence alignment is an indispensable step In this paper weuse dynamic time warping (DTW) to achieve this purposeSignals after alignments show a high correlation whichmeans the calculated values are correct

Figure 11 shows the location information obtained byintegrating velocity signals The calculated control inputsallow the robot to follow the given trajectory accuratelyexcept a minor fluctuation at the beginning time

6 Conclusion

In this paper based on the idea of the imitation learning wepropose a novel idea that the robotrsquos obstacle avoidance skillcan be learned by showing the robot how to avoid an obstacleUnder this consideration task constraints extraction is a keyissue that is how to make robots understand the task Inorder to achieve this purpose the Gaussian mixture model isused to encode the data GMR is used to retrieve satisfactory

12 Journal of Robotics

minus2

0

2120596

(rad

s)

10 20 30 400

t (s)

minus2

0

2

120596(r

ads

)

10 20 30 400

t (s)

0

2

4

dro

(m)

10 20 30 400

t (s)10 20 30 400

t (s)

0

2

4

dro

(m)

minus5

0

5

drg x

(m)

10 20 30 400

t (s)

minus5

0

5

drg x

(m)

10 20 30 400

t (s)

minus5

0

5

10 20 30 400

t (s)10 20 30 400

t (s)

minus5

0

5

drg y

(m)

drg y

(m)

Figure 10 Encoding and retrieval for orientation and distance information in task 2

Expected pathActual path

2 3 4 5 6 71x (m)

05

1

15

2

25

3

35

4

y(m

)

Figure 11 Tracking performance using the derived policy

Journal of Robotics 13

Expected valueCalculated value

50 100 150 2000t (s)

50 100 150 2000t (s)

minus202

Vx

(ms

)

minus101

Vy

(ms

)

(a) Linear velocity

Expected valueCalculated value

50 100 150 2000t (s)

minus202

120596(r

ads

)

(b) Angular velocity

Figure 12 Comparisons between the expected angular velocity andcalculated value

trajectories Eventually to obtain the control strategy a mul-tiobjective optimization problem is constructed The controlstrategy is derived by solving this problem Simulation exper-iments verified our algorithms

For the future improvement work mainly includes thefollowing aspects (i) in order to avoid obstacles onlineonline incremental GMM algorithm should be used (ii) inthis paper a time-based trajectory is generated by GMRthat is use of time signal as input A more reasonable ideais using state information as input so that the robot canchange its decision according to the current environmentfor example using distance information as input and velocityas output thus the control inputs can be directly generated(iii) reinforcement learning algorithms can be integrated tostrengthen robotsrsquo exploration ability so that better general-ization performance is achieved

Competing Interests

The authors declare that the grant scholarship andor fund-ing do not lead to any conflict of interests Additionally theauthors declare that there is no conflict of interests regardingthe publication of this manuscript

Acknowledgments

This work is supported by National Natural Science Founda-tion of China (Grant no 51505470) and the Dr Startup Fundin Liaoning province (20141152)

References

[1] E Masehian and D Sedighizadeh ldquoClassic and heuristicapproaches in robotmotion planningmdasha chronological reviewrdquo

World Academy of Science Engineering and Technology vol 23pp 101ndash106 2007

[2] P Raja and S Pugazhenthi ldquoOptimal path planning of mobilerobots a reviewrdquo International Journal of Physical Sciences vol7 no 9 pp 1314ndash1320 2012

[3] J C Latombe Robot Motion Planning Springer 2012[4] J Borenstein and Y Koren ldquoReal-time obstacle avoidance for

fast mobile robotsrdquo IEEE Transactions on Systems Man andCybernetics vol 19 no 5 pp 1179ndash1187 1989

[5] O Khatib ldquoReal-time obstacle avoidance for manipulators andmobile robotsrdquo International Journal of Robotics Research vol5 no 1 pp 90ndash98 1986

[6] H Mei Y Tian and L Zu ldquoA hybrid ant colony optimizationalgorithm for path planning of robot in dynamic environmentrdquoInternational Journal of Information Technology vol 12 no 3pp 78ndash88 2006

[7] M A P Garcia O Montiel O Castillo R Sepulveda and PMelin ldquoPath planning for autonomous mobile robot navigationwith ant colony optimization and fuzzy cost function evalua-tionrdquo Applied Soft Computing vol 9 no 3 pp 1102ndash1110 2009

[8] C Lopez-Franco J Zepeda N Arana-Daniel and L Lopez-Franco ldquoObstacle avoidance using PSOrdquo in Proceedings of the9th International Conference on Electrical Engineering Comput-ing Science and Automatic Control (CCE rsquo12) pp 1ndash6 MexicoCity Mexico September 2012

[9] H-QMin J-H Zhu andX-J Zheng ldquoObstacle avoidancewithmulti-objective optimization by PSO in dynamic environmentrdquoin Proceedings of the International Conference on MachineLearning and Cybernetics (ICMLC rsquo05) vol 5 pp 2950ndash2956IEEE Guangzhou China August 2005

[10] S Dutta ldquoObstacle avoidance of mobile robot using PSO-basedneuro fuzzy techniquerdquo International Journal of ComputerScience and Engineering vol 2 no 2 pp 301ndash304 2010

[11] W-G Han S-M Baek and T-Y Kuc ldquoGenetic algorithmbased path planning and dynamic obstacle avoidance of mobilerobotsrdquo in Proceedings of the IEEE International Conference onSystems Man and Cybernetics vol 3 pp 2747ndash2751 OrlandoFla USA October 1997

[12] J Tu and S X Yang ldquoGenetic algorithm based path planningfor a mobile robotrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo03) vol 1 pp1221ndash1226 September 2003

[13] Y Zhang and J Wang ldquoObstacle avoidance for kinematicallyredundant manipulators using a dual neural networkrdquo IEEETransactions on Systems Man and Cybernetics Part B Cyber-netics vol 34 no 1 pp 752ndash759 2004

[14] X Wang V Yadav and S N Balakrishnan ldquoCooperativeUAV formation flying with obstaclecollision avoidancerdquo IEEETransactions on Control Systems Technology vol 15 no 4 pp672ndash679 2007

[15] J Godjevac ldquoA learning procedure for a fuzzy system applica-tion to obstacle avoidancerdquo in Proceedings of the InternationalSymposium on Fuzzy Logic pp 142ndash148 Zurich Switzerland1995

[16] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press Boston Mass USA 1998

[17] J Michels A Saxena and A Y Ng ldquoHigh speed obstacleavoidance usingmonocular vision and reinforcement learningrdquoin Proceedings of the 22nd International Conference on MachineLearning (ICML rsquo05) pp 593ndash600 ACMBonnGermany 2005

14 Journal of Robotics

[18] C Ye N H C Yung and D Wang ldquoA fuzzy controller withsupervised learning assisted reinforcement learning algorithmfor obstacle avoidancerdquo IEEETransactions on SystemsMan andCybernetics Part B Cybernetics vol 33 no 1 pp 17ndash27 2003

[19] B-Q Huang G-Y Cao and M Guo ldquoReinforcement learningneural network to the problem of autonomous mobile robotobstacle avoidancerdquo in Proceedings of the International Confer-ence on Machine Learning and Cybernetics (ICMLC rsquo05) vol 1pp 85ndash89 August 2005

[20] C F Touzet ldquoNeural reinforcement learning for behavioursynthesisrdquo Robotics and Autonomous Systems vol 22 no 3-4pp 251ndash281 1997

[21] W D Smart and L P Kaelbling ldquoEffective reinforcement learn-ing for mobile robotsrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo02) vol 4 pp3404ndash3410 Washington DC USA May 2002

[22] K Macek I Petrovic and N Peric ldquoA reinforcement learningapproach to obstacle avoidance ofmobile robotsrdquo inProceedingsof the 7th International Workshop on Advanced Motion Control(AMC rsquo02) pp 462ndash466 IEEE Maribor Slovenia July 2002

[23] U Muller J Ben E Cosatto et al ldquoOff-road obstacle avoidancethrough end-to-end learningrdquo in Advances in Neural Informa-tion Processing Systems pp 739ndash746 2005

[24] B D Argall S Chernova M Veloso and B Browning ldquoAsurvey of robot learning from demonstrationrdquo Robotics andAutonomous Systems vol 57 no 5 pp 469ndash483 2009

[25] C L Nehaniv and K Dautenhahn ldquoLike me-measures ofcorrespondence and imitationrdquo Cybernetics amp Systems vol 32no 1-2 pp 11ndash51 2001

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 10: Research Article Robot Obstacle Avoidance Learning Based ...Research Article Robot Obstacle Avoidance Learning Based on Mixture Models HuiwenZhang, 1,2 XiaoningHan, 1,2 MingliangFu,

10 Journal of Robotics

minus1

0

1120596

(rad

s)

10 200

t (s)

minus1

0

1

120596(r

ads

)

10 200

t (s)

10 200

t (s)

0

5

dro

(m)

10 200

t (s)

0

2

4

dro

(m)

10 200

t (s)

minus5

0

5

drg x

(m)

minus5

0

5

drg x

(m)

10 200

t (s)

minus5

0

5

10 200

t (s)10 200

t (s)

minus5

0

5

drg y

(m)

drg y

(m)

Figure 8 Encoding and retrieval for orientation and distance information in task 1 sharing the same meaning with charts in Figure 7

domean value and covariance which verify the validity of themethod indirectly

Figure 8 shows encoding results for orientation and dis-tance information Figures in the last two rows displayencoding for 119889rg in 119883 and 119884 direction As we see coordinatevalues in119883 directionmonotone decrease to 0This is becausethe robot keeps moving from left to right in the map whilecoordinates in 119884 direction keep going up at the beginningand then drop to zero This is because the robot encountersobstacles in the middle moment in order to avoid obstacles

the robot has to go away from the goal state temporarily Payattention to the final distance value which is close to (0 0)which means the robot reaches the target successfully

Figures 9 and 10 display encoding and retrieval resultsfor task 2 Compared with task 1 task 2 is more complicatedconsidering the law behind the data In order to encode theselaws more Gaussian components are required In our exper-iments 119870 is set to 10 by using the method described in task1 As illustrated in these figures no matter how complex theenvironment is this method still has very satisfactory effect

Journal of Robotics 11

GMM GMR

1

2

3

4

y(m

)

3 4 5 62

x (m)

1

2

3

4

y(m

)

3 4 5 62

x (m)

minus2

minus1

0

1

drg y

(m)

10 3 42

drgx (m)

10 3 42

drgx (m)

minus2

minus1

0

1

drg y

(m)

Figure 9 Trajectory encoding and retrieval using GMM and GMR for variables 119909 and 119889 in task 2

52 Experiments for Finding Control Strategy Experiments inthis part are mainly conducted to verify the derived controlpolicy For a Turtlebot robot the control input is linearvelocity

119909

119904and angular velocity

120579

119904 Assuming we have a

desired trajectory (the trajectory can be given arbitrarily inour experiments in order to find a reference we choose thesecond trajectory in our samples) using theories derived inprevious sections we can compute

119909

119904and

120579

119904 Iterating on

every time we can get a control sequence Comparing thecalculated value with the sensor recorded value we can verifythe validity of the derived results

Figure 12 compares the calculated linear velocity andangular velocity with the desired value Linear velocity isdivided into velocity in 119883 direction and 119884 direction Asthe figure in the second row shows expected velocity in 119884

direction remains close to 0This is because the control inputsfor Turtlebot are a linear velocity in the forward directionthat is 119883 direction and an angular velocity The line graphfor velocity 119884 reveals that there is a minor fluctuation at thebeginning and end stage while the overall trend is consistentwith the fact Velocity 119909 follows the expected value perfectlyConsidering Figure 12(b) it can be seen that the calculatedvalue remains consistent with the actual one It is worth

noting that the length of the calculated sequence differs fromthe actual one since the frequency of the readings fromrobotrsquos sensors is lower than the frequency of commandsIn this case to compare the similarity of the two signalssequence alignment is an indispensable step In this paper weuse dynamic time warping (DTW) to achieve this purposeSignals after alignments show a high correlation whichmeans the calculated values are correct

Figure 11 shows the location information obtained byintegrating velocity signals The calculated control inputsallow the robot to follow the given trajectory accuratelyexcept a minor fluctuation at the beginning time

6 Conclusion

In this paper based on the idea of the imitation learning wepropose a novel idea that the robotrsquos obstacle avoidance skillcan be learned by showing the robot how to avoid an obstacleUnder this consideration task constraints extraction is a keyissue that is how to make robots understand the task Inorder to achieve this purpose the Gaussian mixture model isused to encode the data GMR is used to retrieve satisfactory

12 Journal of Robotics

minus2

0

2120596

(rad

s)

10 20 30 400

t (s)

minus2

0

2

120596(r

ads

)

10 20 30 400

t (s)

0

2

4

dro

(m)

10 20 30 400

t (s)10 20 30 400

t (s)

0

2

4

dro

(m)

minus5

0

5

drg x

(m)

10 20 30 400

t (s)

minus5

0

5

drg x

(m)

10 20 30 400

t (s)

minus5

0

5

10 20 30 400

t (s)10 20 30 400

t (s)

minus5

0

5

drg y

(m)

drg y

(m)

Figure 10 Encoding and retrieval for orientation and distance information in task 2

Expected pathActual path

2 3 4 5 6 71x (m)

05

1

15

2

25

3

35

4

y(m

)

Figure 11 Tracking performance using the derived policy

Journal of Robotics 13

Expected valueCalculated value

50 100 150 2000t (s)

50 100 150 2000t (s)

minus202

Vx

(ms

)

minus101

Vy

(ms

)

(a) Linear velocity

Expected valueCalculated value

50 100 150 2000t (s)

minus202

120596(r

ads

)

(b) Angular velocity

Figure 12 Comparisons between the expected angular velocity andcalculated value

trajectories Eventually to obtain the control strategy a mul-tiobjective optimization problem is constructed The controlstrategy is derived by solving this problem Simulation exper-iments verified our algorithms

For the future improvement work mainly includes thefollowing aspects (i) in order to avoid obstacles onlineonline incremental GMM algorithm should be used (ii) inthis paper a time-based trajectory is generated by GMRthat is use of time signal as input A more reasonable ideais using state information as input so that the robot canchange its decision according to the current environmentfor example using distance information as input and velocityas output thus the control inputs can be directly generated(iii) reinforcement learning algorithms can be integrated tostrengthen robotsrsquo exploration ability so that better general-ization performance is achieved

Competing Interests

The authors declare that the grant scholarship andor fund-ing do not lead to any conflict of interests Additionally theauthors declare that there is no conflict of interests regardingthe publication of this manuscript

Acknowledgments

This work is supported by National Natural Science Founda-tion of China (Grant no 51505470) and the Dr Startup Fundin Liaoning province (20141152)

References

[1] E Masehian and D Sedighizadeh ldquoClassic and heuristicapproaches in robotmotion planningmdasha chronological reviewrdquo

World Academy of Science Engineering and Technology vol 23pp 101ndash106 2007

[2] P Raja and S Pugazhenthi ldquoOptimal path planning of mobilerobots a reviewrdquo International Journal of Physical Sciences vol7 no 9 pp 1314ndash1320 2012

[3] J C Latombe Robot Motion Planning Springer 2012[4] J Borenstein and Y Koren ldquoReal-time obstacle avoidance for

fast mobile robotsrdquo IEEE Transactions on Systems Man andCybernetics vol 19 no 5 pp 1179ndash1187 1989

[5] O Khatib ldquoReal-time obstacle avoidance for manipulators andmobile robotsrdquo International Journal of Robotics Research vol5 no 1 pp 90ndash98 1986

[6] H Mei Y Tian and L Zu ldquoA hybrid ant colony optimizationalgorithm for path planning of robot in dynamic environmentrdquoInternational Journal of Information Technology vol 12 no 3pp 78ndash88 2006

[7] M A P Garcia O Montiel O Castillo R Sepulveda and PMelin ldquoPath planning for autonomous mobile robot navigationwith ant colony optimization and fuzzy cost function evalua-tionrdquo Applied Soft Computing vol 9 no 3 pp 1102ndash1110 2009

[8] C Lopez-Franco J Zepeda N Arana-Daniel and L Lopez-Franco ldquoObstacle avoidance using PSOrdquo in Proceedings of the9th International Conference on Electrical Engineering Comput-ing Science and Automatic Control (CCE rsquo12) pp 1ndash6 MexicoCity Mexico September 2012

[9] H-QMin J-H Zhu andX-J Zheng ldquoObstacle avoidancewithmulti-objective optimization by PSO in dynamic environmentrdquoin Proceedings of the International Conference on MachineLearning and Cybernetics (ICMLC rsquo05) vol 5 pp 2950ndash2956IEEE Guangzhou China August 2005

[10] S Dutta ldquoObstacle avoidance of mobile robot using PSO-basedneuro fuzzy techniquerdquo International Journal of ComputerScience and Engineering vol 2 no 2 pp 301ndash304 2010

[11] W-G Han S-M Baek and T-Y Kuc ldquoGenetic algorithmbased path planning and dynamic obstacle avoidance of mobilerobotsrdquo in Proceedings of the IEEE International Conference onSystems Man and Cybernetics vol 3 pp 2747ndash2751 OrlandoFla USA October 1997

[12] J Tu and S X Yang ldquoGenetic algorithm based path planningfor a mobile robotrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo03) vol 1 pp1221ndash1226 September 2003

[13] Y Zhang and J Wang ldquoObstacle avoidance for kinematicallyredundant manipulators using a dual neural networkrdquo IEEETransactions on Systems Man and Cybernetics Part B Cyber-netics vol 34 no 1 pp 752ndash759 2004

[14] X Wang V Yadav and S N Balakrishnan ldquoCooperativeUAV formation flying with obstaclecollision avoidancerdquo IEEETransactions on Control Systems Technology vol 15 no 4 pp672ndash679 2007

[15] J Godjevac ldquoA learning procedure for a fuzzy system applica-tion to obstacle avoidancerdquo in Proceedings of the InternationalSymposium on Fuzzy Logic pp 142ndash148 Zurich Switzerland1995

[16] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press Boston Mass USA 1998

[17] J Michels A Saxena and A Y Ng ldquoHigh speed obstacleavoidance usingmonocular vision and reinforcement learningrdquoin Proceedings of the 22nd International Conference on MachineLearning (ICML rsquo05) pp 593ndash600 ACMBonnGermany 2005

14 Journal of Robotics

[18] C Ye N H C Yung and D Wang ldquoA fuzzy controller withsupervised learning assisted reinforcement learning algorithmfor obstacle avoidancerdquo IEEETransactions on SystemsMan andCybernetics Part B Cybernetics vol 33 no 1 pp 17ndash27 2003

[19] B-Q Huang G-Y Cao and M Guo ldquoReinforcement learningneural network to the problem of autonomous mobile robotobstacle avoidancerdquo in Proceedings of the International Confer-ence on Machine Learning and Cybernetics (ICMLC rsquo05) vol 1pp 85ndash89 August 2005

[20] C F Touzet ldquoNeural reinforcement learning for behavioursynthesisrdquo Robotics and Autonomous Systems vol 22 no 3-4pp 251ndash281 1997

[21] W D Smart and L P Kaelbling ldquoEffective reinforcement learn-ing for mobile robotsrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo02) vol 4 pp3404ndash3410 Washington DC USA May 2002

[22] K Macek I Petrovic and N Peric ldquoA reinforcement learningapproach to obstacle avoidance ofmobile robotsrdquo inProceedingsof the 7th International Workshop on Advanced Motion Control(AMC rsquo02) pp 462ndash466 IEEE Maribor Slovenia July 2002

[23] U Muller J Ben E Cosatto et al ldquoOff-road obstacle avoidancethrough end-to-end learningrdquo in Advances in Neural Informa-tion Processing Systems pp 739ndash746 2005

[24] B D Argall S Chernova M Veloso and B Browning ldquoAsurvey of robot learning from demonstrationrdquo Robotics andAutonomous Systems vol 57 no 5 pp 469ndash483 2009

[25] C L Nehaniv and K Dautenhahn ldquoLike me-measures ofcorrespondence and imitationrdquo Cybernetics amp Systems vol 32no 1-2 pp 11ndash51 2001

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 11: Research Article Robot Obstacle Avoidance Learning Based ...Research Article Robot Obstacle Avoidance Learning Based on Mixture Models HuiwenZhang, 1,2 XiaoningHan, 1,2 MingliangFu,

Journal of Robotics 11

GMM GMR

1

2

3

4

y(m

)

3 4 5 62

x (m)

1

2

3

4

y(m

)

3 4 5 62

x (m)

minus2

minus1

0

1

drg y

(m)

10 3 42

drgx (m)

10 3 42

drgx (m)

minus2

minus1

0

1

drg y

(m)

Figure 9 Trajectory encoding and retrieval using GMM and GMR for variables 119909 and 119889 in task 2

52 Experiments for Finding Control Strategy Experiments inthis part are mainly conducted to verify the derived controlpolicy For a Turtlebot robot the control input is linearvelocity

119909

119904and angular velocity

120579

119904 Assuming we have a

desired trajectory (the trajectory can be given arbitrarily inour experiments in order to find a reference we choose thesecond trajectory in our samples) using theories derived inprevious sections we can compute

119909

119904and

120579

119904 Iterating on

every time we can get a control sequence Comparing thecalculated value with the sensor recorded value we can verifythe validity of the derived results

Figure 12 compares the calculated linear velocity andangular velocity with the desired value Linear velocity isdivided into velocity in 119883 direction and 119884 direction Asthe figure in the second row shows expected velocity in 119884

direction remains close to 0This is because the control inputsfor Turtlebot are a linear velocity in the forward directionthat is 119883 direction and an angular velocity The line graphfor velocity 119884 reveals that there is a minor fluctuation at thebeginning and end stage while the overall trend is consistentwith the fact Velocity 119909 follows the expected value perfectlyConsidering Figure 12(b) it can be seen that the calculatedvalue remains consistent with the actual one It is worth

noting that the length of the calculated sequence differs fromthe actual one since the frequency of the readings fromrobotrsquos sensors is lower than the frequency of commandsIn this case to compare the similarity of the two signalssequence alignment is an indispensable step In this paper weuse dynamic time warping (DTW) to achieve this purposeSignals after alignments show a high correlation whichmeans the calculated values are correct

Figure 11 shows the location information obtained byintegrating velocity signals The calculated control inputsallow the robot to follow the given trajectory accuratelyexcept a minor fluctuation at the beginning time

6 Conclusion

In this paper based on the idea of the imitation learning wepropose a novel idea that the robotrsquos obstacle avoidance skillcan be learned by showing the robot how to avoid an obstacleUnder this consideration task constraints extraction is a keyissue that is how to make robots understand the task Inorder to achieve this purpose the Gaussian mixture model isused to encode the data GMR is used to retrieve satisfactory

12 Journal of Robotics

minus2

0

2120596

(rad

s)

10 20 30 400

t (s)

minus2

0

2

120596(r

ads

)

10 20 30 400

t (s)

0

2

4

dro

(m)

10 20 30 400

t (s)10 20 30 400

t (s)

0

2

4

dro

(m)

minus5

0

5

drg x

(m)

10 20 30 400

t (s)

minus5

0

5

drg x

(m)

10 20 30 400

t (s)

minus5

0

5

10 20 30 400

t (s)10 20 30 400

t (s)

minus5

0

5

drg y

(m)

drg y

(m)

Figure 10 Encoding and retrieval for orientation and distance information in task 2

Expected pathActual path

2 3 4 5 6 71x (m)

05

1

15

2

25

3

35

4

y(m

)

Figure 11 Tracking performance using the derived policy

Journal of Robotics 13

Expected valueCalculated value

50 100 150 2000t (s)

50 100 150 2000t (s)

minus202

Vx

(ms

)

minus101

Vy

(ms

)

(a) Linear velocity

Expected valueCalculated value

50 100 150 2000t (s)

minus202

120596(r

ads

)

(b) Angular velocity

Figure 12 Comparisons between the expected angular velocity andcalculated value

trajectories Eventually to obtain the control strategy a mul-tiobjective optimization problem is constructed The controlstrategy is derived by solving this problem Simulation exper-iments verified our algorithms

For the future improvement work mainly includes thefollowing aspects (i) in order to avoid obstacles onlineonline incremental GMM algorithm should be used (ii) inthis paper a time-based trajectory is generated by GMRthat is use of time signal as input A more reasonable ideais using state information as input so that the robot canchange its decision according to the current environmentfor example using distance information as input and velocityas output thus the control inputs can be directly generated(iii) reinforcement learning algorithms can be integrated tostrengthen robotsrsquo exploration ability so that better general-ization performance is achieved

Competing Interests

The authors declare that the grant scholarship andor fund-ing do not lead to any conflict of interests Additionally theauthors declare that there is no conflict of interests regardingthe publication of this manuscript

Acknowledgments

This work is supported by National Natural Science Founda-tion of China (Grant no 51505470) and the Dr Startup Fundin Liaoning province (20141152)

References

[1] E Masehian and D Sedighizadeh ldquoClassic and heuristicapproaches in robotmotion planningmdasha chronological reviewrdquo

World Academy of Science Engineering and Technology vol 23pp 101ndash106 2007

[2] P Raja and S Pugazhenthi ldquoOptimal path planning of mobilerobots a reviewrdquo International Journal of Physical Sciences vol7 no 9 pp 1314ndash1320 2012

[3] J C Latombe Robot Motion Planning Springer 2012[4] J Borenstein and Y Koren ldquoReal-time obstacle avoidance for

fast mobile robotsrdquo IEEE Transactions on Systems Man andCybernetics vol 19 no 5 pp 1179ndash1187 1989

[5] O Khatib ldquoReal-time obstacle avoidance for manipulators andmobile robotsrdquo International Journal of Robotics Research vol5 no 1 pp 90ndash98 1986

[6] H Mei Y Tian and L Zu ldquoA hybrid ant colony optimizationalgorithm for path planning of robot in dynamic environmentrdquoInternational Journal of Information Technology vol 12 no 3pp 78ndash88 2006

[7] M A P Garcia O Montiel O Castillo R Sepulveda and PMelin ldquoPath planning for autonomous mobile robot navigationwith ant colony optimization and fuzzy cost function evalua-tionrdquo Applied Soft Computing vol 9 no 3 pp 1102ndash1110 2009

[8] C Lopez-Franco J Zepeda N Arana-Daniel and L Lopez-Franco ldquoObstacle avoidance using PSOrdquo in Proceedings of the9th International Conference on Electrical Engineering Comput-ing Science and Automatic Control (CCE rsquo12) pp 1ndash6 MexicoCity Mexico September 2012

[9] H-QMin J-H Zhu andX-J Zheng ldquoObstacle avoidancewithmulti-objective optimization by PSO in dynamic environmentrdquoin Proceedings of the International Conference on MachineLearning and Cybernetics (ICMLC rsquo05) vol 5 pp 2950ndash2956IEEE Guangzhou China August 2005

[10] S Dutta ldquoObstacle avoidance of mobile robot using PSO-basedneuro fuzzy techniquerdquo International Journal of ComputerScience and Engineering vol 2 no 2 pp 301ndash304 2010

[11] W-G Han S-M Baek and T-Y Kuc ldquoGenetic algorithmbased path planning and dynamic obstacle avoidance of mobilerobotsrdquo in Proceedings of the IEEE International Conference onSystems Man and Cybernetics vol 3 pp 2747ndash2751 OrlandoFla USA October 1997

[12] J Tu and S X Yang ldquoGenetic algorithm based path planningfor a mobile robotrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo03) vol 1 pp1221ndash1226 September 2003

[13] Y Zhang and J Wang ldquoObstacle avoidance for kinematicallyredundant manipulators using a dual neural networkrdquo IEEETransactions on Systems Man and Cybernetics Part B Cyber-netics vol 34 no 1 pp 752ndash759 2004

[14] X Wang V Yadav and S N Balakrishnan ldquoCooperativeUAV formation flying with obstaclecollision avoidancerdquo IEEETransactions on Control Systems Technology vol 15 no 4 pp672ndash679 2007

[15] J Godjevac ldquoA learning procedure for a fuzzy system applica-tion to obstacle avoidancerdquo in Proceedings of the InternationalSymposium on Fuzzy Logic pp 142ndash148 Zurich Switzerland1995

[16] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press Boston Mass USA 1998

[17] J Michels A Saxena and A Y Ng ldquoHigh speed obstacleavoidance usingmonocular vision and reinforcement learningrdquoin Proceedings of the 22nd International Conference on MachineLearning (ICML rsquo05) pp 593ndash600 ACMBonnGermany 2005

14 Journal of Robotics

[18] C Ye N H C Yung and D Wang ldquoA fuzzy controller withsupervised learning assisted reinforcement learning algorithmfor obstacle avoidancerdquo IEEETransactions on SystemsMan andCybernetics Part B Cybernetics vol 33 no 1 pp 17ndash27 2003

[19] B-Q Huang G-Y Cao and M Guo ldquoReinforcement learningneural network to the problem of autonomous mobile robotobstacle avoidancerdquo in Proceedings of the International Confer-ence on Machine Learning and Cybernetics (ICMLC rsquo05) vol 1pp 85ndash89 August 2005

[20] C F Touzet ldquoNeural reinforcement learning for behavioursynthesisrdquo Robotics and Autonomous Systems vol 22 no 3-4pp 251ndash281 1997

[21] W D Smart and L P Kaelbling ldquoEffective reinforcement learn-ing for mobile robotsrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo02) vol 4 pp3404ndash3410 Washington DC USA May 2002

[22] K Macek I Petrovic and N Peric ldquoA reinforcement learningapproach to obstacle avoidance ofmobile robotsrdquo inProceedingsof the 7th International Workshop on Advanced Motion Control(AMC rsquo02) pp 462ndash466 IEEE Maribor Slovenia July 2002

[23] U Muller J Ben E Cosatto et al ldquoOff-road obstacle avoidancethrough end-to-end learningrdquo in Advances in Neural Informa-tion Processing Systems pp 739ndash746 2005

[24] B D Argall S Chernova M Veloso and B Browning ldquoAsurvey of robot learning from demonstrationrdquo Robotics andAutonomous Systems vol 57 no 5 pp 469ndash483 2009

[25] C L Nehaniv and K Dautenhahn ldquoLike me-measures ofcorrespondence and imitationrdquo Cybernetics amp Systems vol 32no 1-2 pp 11ndash51 2001

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 12: Research Article Robot Obstacle Avoidance Learning Based ...Research Article Robot Obstacle Avoidance Learning Based on Mixture Models HuiwenZhang, 1,2 XiaoningHan, 1,2 MingliangFu,

12 Journal of Robotics

minus2

0

2120596

(rad

s)

10 20 30 400

t (s)

minus2

0

2

120596(r

ads

)

10 20 30 400

t (s)

0

2

4

dro

(m)

10 20 30 400

t (s)10 20 30 400

t (s)

0

2

4

dro

(m)

minus5

0

5

drg x

(m)

10 20 30 400

t (s)

minus5

0

5

drg x

(m)

10 20 30 400

t (s)

minus5

0

5

10 20 30 400

t (s)10 20 30 400

t (s)

minus5

0

5

drg y

(m)

drg y

(m)

Figure 10 Encoding and retrieval for orientation and distance information in task 2

Expected pathActual path

2 3 4 5 6 71x (m)

05

1

15

2

25

3

35

4

y(m

)

Figure 11 Tracking performance using the derived policy

Journal of Robotics 13

Expected valueCalculated value

50 100 150 2000t (s)

50 100 150 2000t (s)

minus202

Vx

(ms

)

minus101

Vy

(ms

)

(a) Linear velocity

Expected valueCalculated value

50 100 150 2000t (s)

minus202

120596(r

ads

)

(b) Angular velocity

Figure 12 Comparisons between the expected angular velocity andcalculated value

trajectories Eventually to obtain the control strategy a mul-tiobjective optimization problem is constructed The controlstrategy is derived by solving this problem Simulation exper-iments verified our algorithms

For the future improvement work mainly includes thefollowing aspects (i) in order to avoid obstacles onlineonline incremental GMM algorithm should be used (ii) inthis paper a time-based trajectory is generated by GMRthat is use of time signal as input A more reasonable ideais using state information as input so that the robot canchange its decision according to the current environmentfor example using distance information as input and velocityas output thus the control inputs can be directly generated(iii) reinforcement learning algorithms can be integrated tostrengthen robotsrsquo exploration ability so that better general-ization performance is achieved

Competing Interests

The authors declare that the grant scholarship andor fund-ing do not lead to any conflict of interests Additionally theauthors declare that there is no conflict of interests regardingthe publication of this manuscript

Acknowledgments

This work is supported by National Natural Science Founda-tion of China (Grant no 51505470) and the Dr Startup Fundin Liaoning province (20141152)

References

[1] E Masehian and D Sedighizadeh ldquoClassic and heuristicapproaches in robotmotion planningmdasha chronological reviewrdquo

World Academy of Science Engineering and Technology vol 23pp 101ndash106 2007

[2] P Raja and S Pugazhenthi ldquoOptimal path planning of mobilerobots a reviewrdquo International Journal of Physical Sciences vol7 no 9 pp 1314ndash1320 2012

[3] J C Latombe Robot Motion Planning Springer 2012[4] J Borenstein and Y Koren ldquoReal-time obstacle avoidance for

fast mobile robotsrdquo IEEE Transactions on Systems Man andCybernetics vol 19 no 5 pp 1179ndash1187 1989

[5] O Khatib ldquoReal-time obstacle avoidance for manipulators andmobile robotsrdquo International Journal of Robotics Research vol5 no 1 pp 90ndash98 1986

[6] H Mei Y Tian and L Zu ldquoA hybrid ant colony optimizationalgorithm for path planning of robot in dynamic environmentrdquoInternational Journal of Information Technology vol 12 no 3pp 78ndash88 2006

[7] M A P Garcia O Montiel O Castillo R Sepulveda and PMelin ldquoPath planning for autonomous mobile robot navigationwith ant colony optimization and fuzzy cost function evalua-tionrdquo Applied Soft Computing vol 9 no 3 pp 1102ndash1110 2009

[8] C Lopez-Franco J Zepeda N Arana-Daniel and L Lopez-Franco ldquoObstacle avoidance using PSOrdquo in Proceedings of the9th International Conference on Electrical Engineering Comput-ing Science and Automatic Control (CCE rsquo12) pp 1ndash6 MexicoCity Mexico September 2012

[9] H-QMin J-H Zhu andX-J Zheng ldquoObstacle avoidancewithmulti-objective optimization by PSO in dynamic environmentrdquoin Proceedings of the International Conference on MachineLearning and Cybernetics (ICMLC rsquo05) vol 5 pp 2950ndash2956IEEE Guangzhou China August 2005

[10] S Dutta ldquoObstacle avoidance of mobile robot using PSO-basedneuro fuzzy techniquerdquo International Journal of ComputerScience and Engineering vol 2 no 2 pp 301ndash304 2010

[11] W-G Han S-M Baek and T-Y Kuc ldquoGenetic algorithmbased path planning and dynamic obstacle avoidance of mobilerobotsrdquo in Proceedings of the IEEE International Conference onSystems Man and Cybernetics vol 3 pp 2747ndash2751 OrlandoFla USA October 1997

[12] J Tu and S X Yang ldquoGenetic algorithm based path planningfor a mobile robotrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo03) vol 1 pp1221ndash1226 September 2003

[13] Y Zhang and J Wang ldquoObstacle avoidance for kinematicallyredundant manipulators using a dual neural networkrdquo IEEETransactions on Systems Man and Cybernetics Part B Cyber-netics vol 34 no 1 pp 752ndash759 2004

[14] X Wang V Yadav and S N Balakrishnan ldquoCooperativeUAV formation flying with obstaclecollision avoidancerdquo IEEETransactions on Control Systems Technology vol 15 no 4 pp672ndash679 2007

[15] J Godjevac ldquoA learning procedure for a fuzzy system applica-tion to obstacle avoidancerdquo in Proceedings of the InternationalSymposium on Fuzzy Logic pp 142ndash148 Zurich Switzerland1995

[16] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press Boston Mass USA 1998

[17] J Michels A Saxena and A Y Ng ldquoHigh speed obstacleavoidance usingmonocular vision and reinforcement learningrdquoin Proceedings of the 22nd International Conference on MachineLearning (ICML rsquo05) pp 593ndash600 ACMBonnGermany 2005

14 Journal of Robotics

[18] C Ye N H C Yung and D Wang ldquoA fuzzy controller withsupervised learning assisted reinforcement learning algorithmfor obstacle avoidancerdquo IEEETransactions on SystemsMan andCybernetics Part B Cybernetics vol 33 no 1 pp 17ndash27 2003

[19] B-Q Huang G-Y Cao and M Guo ldquoReinforcement learningneural network to the problem of autonomous mobile robotobstacle avoidancerdquo in Proceedings of the International Confer-ence on Machine Learning and Cybernetics (ICMLC rsquo05) vol 1pp 85ndash89 August 2005

[20] C F Touzet ldquoNeural reinforcement learning for behavioursynthesisrdquo Robotics and Autonomous Systems vol 22 no 3-4pp 251ndash281 1997

[21] W D Smart and L P Kaelbling ldquoEffective reinforcement learn-ing for mobile robotsrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo02) vol 4 pp3404ndash3410 Washington DC USA May 2002

[22] K Macek I Petrovic and N Peric ldquoA reinforcement learningapproach to obstacle avoidance ofmobile robotsrdquo inProceedingsof the 7th International Workshop on Advanced Motion Control(AMC rsquo02) pp 462ndash466 IEEE Maribor Slovenia July 2002

[23] U Muller J Ben E Cosatto et al ldquoOff-road obstacle avoidancethrough end-to-end learningrdquo in Advances in Neural Informa-tion Processing Systems pp 739ndash746 2005

[24] B D Argall S Chernova M Veloso and B Browning ldquoAsurvey of robot learning from demonstrationrdquo Robotics andAutonomous Systems vol 57 no 5 pp 469ndash483 2009

[25] C L Nehaniv and K Dautenhahn ldquoLike me-measures ofcorrespondence and imitationrdquo Cybernetics amp Systems vol 32no 1-2 pp 11ndash51 2001

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 13: Research Article Robot Obstacle Avoidance Learning Based ...Research Article Robot Obstacle Avoidance Learning Based on Mixture Models HuiwenZhang, 1,2 XiaoningHan, 1,2 MingliangFu,

Journal of Robotics 13

Expected valueCalculated value

50 100 150 2000t (s)

50 100 150 2000t (s)

minus202

Vx

(ms

)

minus101

Vy

(ms

)

(a) Linear velocity

Expected valueCalculated value

50 100 150 2000t (s)

minus202

120596(r

ads

)

(b) Angular velocity

Figure 12 Comparisons between the expected angular velocity andcalculated value

trajectories Eventually to obtain the control strategy a mul-tiobjective optimization problem is constructed The controlstrategy is derived by solving this problem Simulation exper-iments verified our algorithms

For the future improvement work mainly includes thefollowing aspects (i) in order to avoid obstacles onlineonline incremental GMM algorithm should be used (ii) inthis paper a time-based trajectory is generated by GMRthat is use of time signal as input A more reasonable ideais using state information as input so that the robot canchange its decision according to the current environmentfor example using distance information as input and velocityas output thus the control inputs can be directly generated(iii) reinforcement learning algorithms can be integrated tostrengthen robotsrsquo exploration ability so that better general-ization performance is achieved

Competing Interests

The authors declare that the grant scholarship andor fund-ing do not lead to any conflict of interests Additionally theauthors declare that there is no conflict of interests regardingthe publication of this manuscript

Acknowledgments

This work is supported by National Natural Science Founda-tion of China (Grant no 51505470) and the Dr Startup Fundin Liaoning province (20141152)

References

[1] E Masehian and D Sedighizadeh ldquoClassic and heuristicapproaches in robotmotion planningmdasha chronological reviewrdquo

World Academy of Science Engineering and Technology vol 23pp 101ndash106 2007

[2] P Raja and S Pugazhenthi ldquoOptimal path planning of mobilerobots a reviewrdquo International Journal of Physical Sciences vol7 no 9 pp 1314ndash1320 2012

[3] J C Latombe Robot Motion Planning Springer 2012[4] J Borenstein and Y Koren ldquoReal-time obstacle avoidance for

fast mobile robotsrdquo IEEE Transactions on Systems Man andCybernetics vol 19 no 5 pp 1179ndash1187 1989

[5] O Khatib ldquoReal-time obstacle avoidance for manipulators andmobile robotsrdquo International Journal of Robotics Research vol5 no 1 pp 90ndash98 1986

[6] H Mei Y Tian and L Zu ldquoA hybrid ant colony optimizationalgorithm for path planning of robot in dynamic environmentrdquoInternational Journal of Information Technology vol 12 no 3pp 78ndash88 2006

[7] M A P Garcia O Montiel O Castillo R Sepulveda and PMelin ldquoPath planning for autonomous mobile robot navigationwith ant colony optimization and fuzzy cost function evalua-tionrdquo Applied Soft Computing vol 9 no 3 pp 1102ndash1110 2009

[8] C Lopez-Franco J Zepeda N Arana-Daniel and L Lopez-Franco ldquoObstacle avoidance using PSOrdquo in Proceedings of the9th International Conference on Electrical Engineering Comput-ing Science and Automatic Control (CCE rsquo12) pp 1ndash6 MexicoCity Mexico September 2012

[9] H-QMin J-H Zhu andX-J Zheng ldquoObstacle avoidancewithmulti-objective optimization by PSO in dynamic environmentrdquoin Proceedings of the International Conference on MachineLearning and Cybernetics (ICMLC rsquo05) vol 5 pp 2950ndash2956IEEE Guangzhou China August 2005

[10] S Dutta ldquoObstacle avoidance of mobile robot using PSO-basedneuro fuzzy techniquerdquo International Journal of ComputerScience and Engineering vol 2 no 2 pp 301ndash304 2010

[11] W-G Han S-M Baek and T-Y Kuc ldquoGenetic algorithmbased path planning and dynamic obstacle avoidance of mobilerobotsrdquo in Proceedings of the IEEE International Conference onSystems Man and Cybernetics vol 3 pp 2747ndash2751 OrlandoFla USA October 1997

[12] J Tu and S X Yang ldquoGenetic algorithm based path planningfor a mobile robotrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo03) vol 1 pp1221ndash1226 September 2003

[13] Y Zhang and J Wang ldquoObstacle avoidance for kinematicallyredundant manipulators using a dual neural networkrdquo IEEETransactions on Systems Man and Cybernetics Part B Cyber-netics vol 34 no 1 pp 752ndash759 2004

[14] X Wang V Yadav and S N Balakrishnan ldquoCooperativeUAV formation flying with obstaclecollision avoidancerdquo IEEETransactions on Control Systems Technology vol 15 no 4 pp672ndash679 2007

[15] J Godjevac ldquoA learning procedure for a fuzzy system applica-tion to obstacle avoidancerdquo in Proceedings of the InternationalSymposium on Fuzzy Logic pp 142ndash148 Zurich Switzerland1995

[16] R S Sutton and A G Barto Reinforcement Learning AnIntroduction MIT Press Boston Mass USA 1998

[17] J Michels A Saxena and A Y Ng ldquoHigh speed obstacleavoidance usingmonocular vision and reinforcement learningrdquoin Proceedings of the 22nd International Conference on MachineLearning (ICML rsquo05) pp 593ndash600 ACMBonnGermany 2005

14 Journal of Robotics

[18] C Ye N H C Yung and D Wang ldquoA fuzzy controller withsupervised learning assisted reinforcement learning algorithmfor obstacle avoidancerdquo IEEETransactions on SystemsMan andCybernetics Part B Cybernetics vol 33 no 1 pp 17ndash27 2003

[19] B-Q Huang G-Y Cao and M Guo ldquoReinforcement learningneural network to the problem of autonomous mobile robotobstacle avoidancerdquo in Proceedings of the International Confer-ence on Machine Learning and Cybernetics (ICMLC rsquo05) vol 1pp 85ndash89 August 2005

[20] C F Touzet ldquoNeural reinforcement learning for behavioursynthesisrdquo Robotics and Autonomous Systems vol 22 no 3-4pp 251ndash281 1997

[21] W D Smart and L P Kaelbling ldquoEffective reinforcement learn-ing for mobile robotsrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo02) vol 4 pp3404ndash3410 Washington DC USA May 2002

[22] K Macek I Petrovic and N Peric ldquoA reinforcement learningapproach to obstacle avoidance ofmobile robotsrdquo inProceedingsof the 7th International Workshop on Advanced Motion Control(AMC rsquo02) pp 462ndash466 IEEE Maribor Slovenia July 2002

[23] U Muller J Ben E Cosatto et al ldquoOff-road obstacle avoidancethrough end-to-end learningrdquo in Advances in Neural Informa-tion Processing Systems pp 739ndash746 2005

[24] B D Argall S Chernova M Veloso and B Browning ldquoAsurvey of robot learning from demonstrationrdquo Robotics andAutonomous Systems vol 57 no 5 pp 469ndash483 2009

[25] C L Nehaniv and K Dautenhahn ldquoLike me-measures ofcorrespondence and imitationrdquo Cybernetics amp Systems vol 32no 1-2 pp 11ndash51 2001

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 14: Research Article Robot Obstacle Avoidance Learning Based ...Research Article Robot Obstacle Avoidance Learning Based on Mixture Models HuiwenZhang, 1,2 XiaoningHan, 1,2 MingliangFu,

14 Journal of Robotics

[18] C Ye N H C Yung and D Wang ldquoA fuzzy controller withsupervised learning assisted reinforcement learning algorithmfor obstacle avoidancerdquo IEEETransactions on SystemsMan andCybernetics Part B Cybernetics vol 33 no 1 pp 17ndash27 2003

[19] B-Q Huang G-Y Cao and M Guo ldquoReinforcement learningneural network to the problem of autonomous mobile robotobstacle avoidancerdquo in Proceedings of the International Confer-ence on Machine Learning and Cybernetics (ICMLC rsquo05) vol 1pp 85ndash89 August 2005

[20] C F Touzet ldquoNeural reinforcement learning for behavioursynthesisrdquo Robotics and Autonomous Systems vol 22 no 3-4pp 251ndash281 1997

[21] W D Smart and L P Kaelbling ldquoEffective reinforcement learn-ing for mobile robotsrdquo in Proceedings of the IEEE InternationalConference on Robotics and Automation (ICRA rsquo02) vol 4 pp3404ndash3410 Washington DC USA May 2002

[22] K Macek I Petrovic and N Peric ldquoA reinforcement learningapproach to obstacle avoidance ofmobile robotsrdquo inProceedingsof the 7th International Workshop on Advanced Motion Control(AMC rsquo02) pp 462ndash466 IEEE Maribor Slovenia July 2002

[23] U Muller J Ben E Cosatto et al ldquoOff-road obstacle avoidancethrough end-to-end learningrdquo in Advances in Neural Informa-tion Processing Systems pp 739ndash746 2005

[24] B D Argall S Chernova M Veloso and B Browning ldquoAsurvey of robot learning from demonstrationrdquo Robotics andAutonomous Systems vol 57 no 5 pp 469ndash483 2009

[25] C L Nehaniv and K Dautenhahn ldquoLike me-measures ofcorrespondence and imitationrdquo Cybernetics amp Systems vol 32no 1-2 pp 11ndash51 2001

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 15: Research Article Robot Obstacle Avoidance Learning Based ...Research Article Robot Obstacle Avoidance Learning Based on Mixture Models HuiwenZhang, 1,2 XiaoningHan, 1,2 MingliangFu,

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of