PPCP: Efficient Probabilistic Planning with Clear ... maxim/files/ppcp_  · PDF...

Click here to load reader

  • date post

    13-Nov-2018
  • Category

    Documents

  • view

    212
  • download

    0

Embed Size (px)

Transcript of PPCP: Efficient Probabilistic Planning with Clear ... maxim/files/ppcp_  · PDF...

  • PPCP: Efficient Probabilistic Planning with Clear Preferences in Partially-KnownEnvironments

    Maxim LikhachevThe Robotics Institute

    Carnegie Mellon UniversityPittsburgh, PA 15213maxim+@cs.cmu.edu

    Anthony StentzThe Robotics

    Carnegie Mellon UniversityPittsburgh, PA 15213axs@rec.ri.cmu.edu

    Abstract

    For most real-world problems the agent operates in only par-tially-known environments. Probabilistic planners can reasonover the missing information and produce plans that take intoaccount the uncertainty about the environment. Unfortunatelythough, they can rarely scale up to the problems that are of in-terest in real-world. In this paper, however, we show that fora certain subset of problems we can develop a very efficientprobabilistic planner. The proposed planner, called PPCP, isapplicable to the problems for which it is clear what valuesof the missing information would result in the best plan. Inother words, there exists a clear preference for the actual val-ues of the missing information. For example, in the problemof robot navigation in partially-known environments it is al-ways preferred to find out that an initially unknown locationis traversable rather than not. The planner we propose ex-ploits this property by using a series of deterministic A*-likesearches to construct and refine a policy in anytime fashion.On the theoretical side, we show that once converged, the pol-icy is guaranteed to be optimal under certain conditions. Onthe experimental side, we show the power of PPCP on theproblem of robot navigation in partially-known terrains. Theplanner can scale up to very large environments with thou-sands of initially unknown locations. We believe that this isseveral orders of magnitude more unknowns than what thecurrent probabilistic planners developed for the same prob-lem can handle. Also, despite the fact that the problem weexperimented on in general does not satisfy the conditions forthe solution optimality, PPCP still produces the solutions thatare nearly always optimal.

    IntroductionFor many real-world problems the agent operates in an en-vironment that is only partially-known. Examples of suchproblems include robot navigation in partially-known ter-rains, route planning under partially-known traffic condi-

    This work was sponsored by the U.S. Army Research Labora-tory, under contract Robotics Collaborative Technology Alliance(contract number DAAD19-01-2-0012). The views and conclu-sions contained in this document are those of the authors and shouldnot be interpreted as representing the official policies, either ex-pressed or implied, of the Army Research Laboratory or the U.S.Government. The authors would also like to thank Dave Fergusonfor his very helpful comments on the paper.Copyright c 2006, American Association for Artificial Intelli-gence (www.aaai.org). All rights reserved.

    tions, air traffic management with changing weather condi-tions and others. Ideally, in all of these situations, to producea plan, a planner needs to reason over the probability dis-tribution over all the possible instances of the environment.Such planning corresponds to POMDP planning, however,and is known to be hard (PSPACE-complete (Papadimitriou& Tsitsiklis 1987)).

    For many of these problems, however, one can clearlyname beforehand the best values of the variables that repre-sent the unknowns in the environment. We call such valuespreferred values. Thus, in the robot navigation problem, forexample, it is always preferred to find out that an initially un-known location is traversable rather than not. In the problemof route planning under partially-known traffic conditions, itis always preferred to find out that there is no traffic on theroad of interest. And in the air traffic management problemit is always preferred to have a good flying weather. In thispaper we give an algorithm called PPCP (Probabilistic Plan-ning with Clear Preferences) that is able to scale up to verylarge problems by exploiting this property.

    PPCP constructs and refines a plan by running a series ofdeterministic searches. By making a certain approximatingassumption about the problem, PPCP keeps the complexityof each search low and independent of the amount of themissing information. Each search is extremely fast, and run-ning a series of fast low-dimensional searches turns out to bemuch faster than solving the full problem at once since thememory requirements are much lower. While the assump-tion the algorithm makes does not need to hold for the foundplan to be valid, it is guaranteed to be optimal if the assump-tion holds.

    We demonstrated the power of PPCP on the problem of ro-bot navigation in partially-known environments. We foundthat it was able to scale up to large (0.5km by 0.5km) envi-ronments with thousands of initially unknown locations.

    The Motivating ProblemThe motivation for our research was the problem of robotnavigation in partially-known terrains. In outside environ-ments it is often the case that the robot has only a rough mapof the environment (for example, a map based on a satelliteimage as shown in figure 1(b), or a map constructed fromprevious scans of the environment). In such a map manyplaces can not be classified as traversable or untraversable

  • (a) XUV (b) satellite image (c) planning map

    (d) navigation with (e) policy (f) navigation withfreespace assumption produced by PPCP PPCP planning

    Figure 1: Robot Navigation in Partially-Known Terrain

    with full certainty. Instead, for each such place we can esti-mate a probability of it being obstructed.

    Figure 1(a) shows the ground vehicle used to motivatethis task. The robot has a number of sensors on it includinga laser rangefinder for scanning environment. Figure 1(b)shows a satellite image of the environment in which the ro-bot operates. It covers the area of 1km by 1km area. Thisimage is post-processed to produce a discretized 2D envi-ronment of size 1000 by 1000 cells. Each cell is associatedwith the cost of traversability. Figure 1(c) shows a simplerplanning map of size 200 by 200 cells. In this particular ex-ample, 5 cells are assumed to be unknown (shown as raisedwhite rectangles) and are associated with the probability ofthem being obstructed; we assume independence betweenunknown squares. To simplify the problem we only con-sider two possibilities for each of these cells: traversable ornot.1 (The described PPCP algorithm, however, can handlemore than two possible values for each unknown cell.) Thecells that are occupied by obstacles are shown in dark redcolor. The robot is located in the near right corner of themap, while its goal is at the far end of the map.

    The robot must take the unknown locations into accountwhen planning for its route. It may plan a path throughthese locations, but it risks having to turn back if its wayis blocked. For example, figure 1(d) shows the path tra-versed by the robot if it always assumes that unknown loca-tions are free unless sensed otherwise and plans and followsthe shortest trajectories under this assumption (freespace as-sumption (Koenig & Smirnov 1996)). The trajectory tra-versed by the robot is shown in white, while the plannedpath is shown in black. We assume that the sensing rangeof the robot is 1 meter, which corresponds to 1 cell. Wealso assume perfect sensors: after the robot senses a squareit knows its true state forever after.

    In contrast to planning with freespace assumption, fig-ure 1(e) shows the full policy after convergence (in black

    1This assumption also models better the problem of robot nav-igation in the environment where some locations can be occupiedby adversaries. We are actually more interested in this problem,but use the problem of robot navigation in partially-known terrainsto illustrate the algorithm since it is easier to describe, is highlyrelated and is also an important problem in robotics.

    color) produced by our algorithm. This policy happens to beoptimal: it minimizes the expected distance traveled by therobot before it reaches its destination. According to the pol-icy the robot should first try the path on its left since there are4 unknown locations in there and therefore there is a higherchance that one of them will be free and the robot will beable to go through. Figure 1(f) shows the actual trajectorytraversed by the robot after it executed the policy.

    Designing a good policy for the robot is a POMDP plan-ning problem, which are in general very hard to solve. In thePOMDP representation, a state is the position of the robotand the true status of each unknown location. The position ofthe robot is observable, and the hidden variables are whethereach unknown place is occupied. The number of states is (#of robot locations)2# of unknown places. So, the number ofstates is exponential in the number of unknown places andtherefore quickly becomes very large. We can define, how-ever, clear preferences for the unknown locations: it is al-ways preferred to have an unknown location to be free ratherthan obstructed. This makes our PPCP algorithm applicableto the problem.

    Notations and AssumptionsIn this section we introduce some notations and formalizemathematically the class of problems our algorithm is suit-able for. Throughout the rest of the paper, to help understandvarious concepts we will often explain them on the prob-lem of robot navigation in partially-known environments de-scribed in the previous section.

    We assume that the environment is fully deterministic andcan be modeled as a graph. That is, if we were to knowthe true value of each variable that represents the missinginformation about the environment then there would be nouncertainty in an outcome of any action. There are certainelements of the environment, however, whose status we areuncertain about and which affect the outcomes (and/or pos-sible costs) of one or more actions. In