Robot path planning for maze navigation

Robot Path Planning for Maze Navigation Dimitris C. Dracopoulos

Brunel University Department of Computer Science

London, UK E-mail: Dimitris . DracopoulosQbrunel . ac . uk .

Abstract- This paper presents the application of niul- tilayer perceptrons to the robot path planning problem, and in particular to the task of maze navigation. Previous published results implied that the training of feedforward multilayered networks failed, because of the non-smoothness of data. Here the same maze problem is revisited.

1. Introduction N E of the major components for the creation of 0 autonomous robots is the ability of a robot to

“plan its paths” and in general the ability to “plan its motion”. In a limited or carefully engineered environment it is possible to program the robot for all possible combinations of motions in order to accomplish specific tasks [l]. But even when this is possible, one would like to tell the robot “what” to do, rather than “how” to do it, making its operation much easier [l].

In general however, “pre-programming” a robot for all possible cases or conditions it will meet is impossible, due to the fact that the number of motion combinations can be large or infinite (as is the case in non- tracked systems). In addition, it is highly desirable to have robots which are able to adapt and operate in unknown or (changing environments. Adaptation, robustness and operation in a wide range of environments, provides robots with a higher degree of “intelligence”. Such an intelligence can be achieved only through learning.

This paper considers the application of artificial neural networks (and more specifically multilayer perceptrons) to robot path planning. The problem which is addressed here is that of maze navigation.

Apart from the obvious industrial applications the solution of such a problem is inspired from daily life. Humans seem to be able to find their optimal path through rooms they have not visited before, without keep bumping or colliding with the various obstacles that lie in the room. Somehow, they are able to “see” the obstacles and make an appropriate optimum path planning so as to avoid them [2]. In contrast with this, many path planning techniques (including machine learning algorithms) seem to lead the robot through a room and attempt to “force” the robot to

learn its future path, after a number of collisions with obstacles in the environment. Such an approach besides being an unnatural way (when compared with humans), it is infeasible for most real industrial applications, as the result of a robot colliding with its environment can be catastrophic for the robot itself and damaging for the environment.

The next section provides some background in path planning and relevant techniques. Section 3 describes the simple maze problem considered here, and section 4 proposes an artificial neural network architecture for the solution of the maze problem along with the results obtained from this approach. Finally, section 5 summarises what was achieved and gives directions for future work.

2. Path planning The general problem of path planning for au-

tonomous robots is defined as the search for a path which a robot (with specified geometry) has to follow in a described environment, in order to reach a particular position and orientation B , given an initial position and orientation A (Figure 1). The path is subject to certain constraints which serve to avoid obstacle col- lision and optimise the performance of the robot (e.g. find the path with the minimum distance, or find the path which minimises the energy spent by the robot).

Sometimes the motion of a robot is restricted to particular paths or roadways, the railway tracks [3]. In this case, the problem of path planning is equivalent to that of graph search and any of the graph search algorithms can be utilised (e.g. A* or Dijkstra’s algo- rithm). In general however, systems are non-tracked and the number of routes a robot can follow is infinite.

Real robot path planning becomes even more com- plicated due to the fact that the shape of the robot has to be taken into account. A tool which is commonly used to face this extra complication is the configuration space. The robot’s configuration space C represents the robot as a point and maps the obstacles in this space in an appropriate way. This mapping transforms the problem of planning the motion of a dimensioned ob- ject into the problem of planning the motion of a point

0-7803-4859- 1/98 $1 O.oOO1998 IEEE 2081

Figure 1. Path plianning: the robot would l i e to move from the position and orientation A to the position and orientation B avoiding the shaded obstacles.

[l]. The full mathematical analysis of a path planning problem using the configuration space is too complex as the number of dimensions increases, therefore other techniques have 1;o be used in combination with the Configuration space.

Potential fields is one of the most popular methods. Following this approach, the robot resembles a parti- cle which is moving under the influence of a potential field created by the target configuration (position and orientation of t b e robot) and the obstacles in the C space. The target is considered to be charged nega- tively which makes the robot being attracted from it, while the obstacles have a positive charge resulting in a robot motion which avoids the obstacles. The total potential field. (with a negative gradient) applies a force to the robot which pulls it towards the goal. The direction of this force defines the robot’s trajectory.

A number of pcAential field algorithms exist. Many of them suffer from the problem of local minima. Depend- ing on the charges of the obstacles and the goal configuration, the robot may be leaded to a position where the force asked upon it is zero. In such a situation it cannot move any more and it is trapped. To overcome this problem, thie potential field methods have to be able to design potential functions which have no local minima (something which is not always easy and in general it is computationally complex), or to employ mechanisms which allow the robot to escape from local minima. Still however, the computational complexity of the path planning increases exponentially with the dimension of the robot’s conflguration space [l]. An additional complication which becomes a problem for

the application of such techniques, is an environment which is changing dynamically (e.g. if the obstacles axe not fixed but they are moving). In such a case, the potential function depends on time t and its design is an extremelly difllcult or impossible task (consider the extreme w e where the obstacles are moving in an unknown way, i.e. the environment is unknown).

Besides the environmental constraints which add an additional level of difficulty to the path planning problem, there are also kinematic constraints in the motion of the robot. Two broad categories for the kinematic constraints are usually considered holonomic and non- holonomic constraints. The former reduces the dimension of the configuration space attainable by the robot, while the latter reduces the number of possible motions. For example, a car-robot is subject to holonomic constraints as its motion is restricted to be always along its main axis.

3. The maze problem

It is well known that maze navigation is an important task in robotics. The maze problem considered in this paper was proposed by Werbos in [4]. This problem was used as a testbed for the capability of different artificial neural networks to approximate non-smooth functions. As pointed out in [2], the testbed was established due to the fact that the team in [5] was unable to succeed in the training of a multilayer perceptron to solve the problem.

The task is defined as follows: Given a maze of the type shown in Figure 2 (which may vary in size, number of dimensions or the configuration of the obstacles), find an appropriate path which moves a robot from an initial position I to a target position G, while minimis- ing the distance which the robot has to travel.

In [4], [SI a specialised neural network architecture, based on simultaneous recurrent networks was designed to approximate the dynamic programing J function of the maze navigation problem. Such an approximation is crucial for adaptive critics neurocontrol methods. These methods are approximate dynamic programming (ADP) techniques: given a utility function U which has to be maximized over all future time (which could be an infinite or a finite time horizon), find an approximation of the strategic utility function J, for which maximization in the short term will maximize U in the long term [6]. Exact dynamic programming is too “expensive” computationally for complex dynamic systems or large problems, therefore ADP methods are used. A neural network (called the crit ic network) learns to output (predict) an approximation J* of the function J. Thus the Bellman equation in dynamic

2082

X. X 1 g

Figure 2. The maze problem: And the full optimum path which will lead the robot from the position (xi, y;) to the position (xerye).

programming [7] is not

(1) solved exactly, but it is replaced by a neural network able to approximate J(R) by J*(R) [8].

However, in [4], [e] the emphasis was given in the capability of the simultaneous recurrent network architecture to approximate the J function, rather than solving the path planning problem for the 5 by 5 maze of Figure 2. In addixion, the results presented the approximation of the ,J function for a particular target goal, although recent experiments with the proposed SRN architecture [2], seemed to give promising results for generalisation in different mazes. Generalisation in cases where are ‘“seen” by the planner (either for different initial and final conditions or for different mazes) has a great importance for real world path planning applications. Letting a robot to move around a fixed environment colliding with objects and become famil- iar with it, in order to perform again exactly the same navigation task, is an approach of limited usefulness. Besides the cost involved in such techniques (due to the damages occurecl with the collisions), the necessity for intelligent behaviour defined by properties such as adaptivity and robulstness, makes their application in- appropriate for mamy cases. After all, humans are able to “see” their near optimum path, when they arrive in a room €or the fist time.

Intelligent behaviour of a robot can be achieved only

if the robot is capable of learning. Artificial neural networks have learning properties which make them ideal candidates for robot motion planning. However, a neural network application was unsuccessful for a problem of this type [5]. The next section proposes a neural architecture for the solution of the maze problem described here.

4. Neural Path Planning The maze problem illustrated in Figure 2 has an ex-

tra complication, which is not apparent when one considers the problem for the first time. For each of the cells and a specific goal position, there is more than one equally good direction [6]. Some of the cells have up to four equally good directions, as there are four different paths which are optimum for the target configuration. This can be very confusing for any path planner, including humans. In particular, such a case may cause many difficulties for the training of artificial neural networks. Since the mapping between current state and next adionlcurrent state is one-to-many, multilayer perceptrons will learn an incorrect model. This is true because standard supervised learning algorithms average over multiple targets, assuming a squared error criterion function [8]. In addition, such data can be non-smooth and training based on such data can be very difficult or impossible for most networks (this was the motivation for the work on specialised more powerful SRN networks in [6]).

The approach which is tested here is based on feedforward backpropagation-type neural networks. Al- though such networks suffer from the problem of local minima, in practice it is found that many local minima give good results. In addition a global minimum, based on the total error of the training data does not guar- antee or imply that the generdisation of the network will be better. However, one could apply a straightfor- ward validation test (by splitting the available data in three sets) and decide the point which the training of the network should be stopped [8].

The network used here is trained to predict the direction that the robot should move at the next time step. Hence, one output node is used. The inputs to the network are the current position of the robot (z, y) and its target position (zg,yg). A network architecture with two hidden layers and a total size of 4-10- 10-1 was used (Figure 3), after having done a number of experiments, in order to determine it. However, no real effort in optimising the network architecture was attempted.

The training set consisted of 174 distinct data, mapping the current position and target position to the optimum action. These training data were generated

2083

4 North, South, East or West

t t t t X Y g

X

Figure 3. The arlchitecture of the network used. The inputs to the network are the current robot position (x,y) and its goal (+,yg). Its output determines whether the robot will move North, South, East or West at ,the next time step.

from the optimum full paths of 50 trajectories. The trajectories were chosen from 50 different initial-target conditions. The generation of training data taken out of optimum full trajectories makes the training of the network an easier task, as the data produced in this way are “smoother”. The target values in the output node were scaled in the range [-0.9,0.9] so as to avoid the “saturated areas” of the sigmoidal function used.

Standard multilayer perceptron training can be sig- niscantly improved in terms of speed convergence if an adaptive learning rate is used [9]. In all of the experiments described here, a different rule for adapting the learning rate a was used as follows:

= { 1.05a, otherwise 0 . 7 ~ , if

(2) error ’ 1-04 Using this learning rate update after each iteration of the training samples, not only speeds up the process of learning but can also help to avoid many local minima. The initial value which was used here for the learning rate was 0.02. tn addition, the learning rule used to update the weights utilised the standard momentum term with a coefficient of 0.7.

The possible number of combinations for inputs and outputs to the network is 462 (if one ignores the trivial

case where the initial and final conditions are the same, so the robot does not have to move). The described feedforward network, learned perfectly the 174 training samples and its training was stopped after 50,000 iterations. The remaining 288 samples were used to test the generalisation capability of the network. Out of the 288 test data, the network predicted correctly the next move of the robot in 203 cases. This is based in the assumption that only one move out of the four is correct (hypothesis of uniqueness) and priority of cor- rectness is given according to the order: North, South, East, West. That is, if more than one moves lead the robot to an optimum path, we count as correct only the move which comes first in the described order (this assumption was made as the training targets were generated in the same fashion). However, if one accepts as correct prediction any of the moves which lead the robot to the optimum path (something which is much more realistic and the true case), then the generalisation capability of the network exceeds 80%. Such a high accuracy is very desirable in real world path planning problems.

5. Conclusions and future work This work presented how multilayer perceptrons can

be applied to the robot path planning problem. For this purpose, the task of maze navigation was considered. Previously published work indicated that the problem could not be solved, using the standard feedforward backpropagation type networks [5]. The results shown here suggest, that such an approach is not only feasible for path planning problems, but also that the accuracy which can be achieved is quite high.

Future work has to demonstrate how well the neural architecture will scale when applied to mazes of different size or to mazes where the robot has more degrees of freedom. Although initially it can be thought that more degrees of freedom make the problem more difficult, such a case will generate much smoother data, something which usually makes the training of multilayer perceptrons an easier task.

References Jean-Claude Latombe, Robot Motion Planning, Kluwer Aca- demic Publishers, 1991. Paul J. Werbos, ”, 1996, personal communication. Stephen Cameron, “Obstacle avoidance and path planning” , Industrial Robot, vol. 21, no. 5, 1994. Paul J. Werbos and Xiaozhong Pang, “Generalized maze navigation: SRN critics solve what feedforward or hebbian nets cannot”, in World Congress on Neuml Networks, Son Diego, California. September 1996, pp. 88-93, Lawrence Erl- baum Associates,Inc. and INNS Press. P. Houillon and A. Caron, “Planar robot control in clut- tered space by artificial neural network”, Journal of Math Modeling and Computing, pp. 498-502, 1993.

2084

[6] Xiaozhong Pang and Paul J. Werbos, "Neural network design for J function a.pproximation in dynamic programming", Neural Networks, t D appear.

[7] Dimitri P. Bertsekas and John N. Tsitsiklis, Neuro-Dynamic Pmgmmming, Athima Scientific, 1996.

[SI Dimitris C. Dracopoulos, Evolutionary Learning Algorithms for Neuml Adaptive Control, Springer Verlag, August 1997.

[9] David A. White and Donald A. Sofge, Eds., Handbook of intelligent Control, Van Nostrand Reinhold, 1992.

2085

Robot path planning for maze navigation

Documents

Transcript of Robot path planning for maze navigation