Trade off between Exploration and Exploitation in Satisficing Planning
description
Transcript of Trade off between Exploration and Exploitation in Satisficing Planning
![Page 1: Trade off between Exploration and Exploitation in Satisficing Planning](https://reader035.fdocuments.net/reader035/viewer/2022062812/56816320550346895dd39adb/html5/thumbnails/1.jpg)
Trade off between Exploration and Exploitation
in Satisficing Planning
Fan Xie
![Page 2: Trade off between Exploration and Exploitation in Satisficing Planning](https://reader035.fdocuments.net/reader035/viewer/2022062812/56816320550346895dd39adb/html5/thumbnails/2.jpg)
OutlineWhat is Satisficing PlanningHeuristic Search in PlanningWhy we need exploration?Analysis of ArvandArvand-LTS: Arvand with Local MCTSExperiments
![Page 3: Trade off between Exploration and Exploitation in Satisficing Planning](https://reader035.fdocuments.net/reader035/viewer/2022062812/56816320550346895dd39adb/html5/thumbnails/3.jpg)
OutlineWhat is Satisficing PlanningHeuristic Search in PlanningWhy we need exploration?Analysis of ArvandArvand-LTS: Arvand with Local MCTSExperiments
![Page 4: Trade off between Exploration and Exploitation in Satisficing Planning](https://reader035.fdocuments.net/reader035/viewer/2022062812/56816320550346895dd39adb/html5/thumbnails/4.jpg)
AI Planning
![Page 5: Trade off between Exploration and Exploitation in Satisficing Planning](https://reader035.fdocuments.net/reader035/viewer/2022062812/56816320550346895dd39adb/html5/thumbnails/5.jpg)
Satisficing PlanningDeterministic environmentOnly require sub-optimal solutionsDomain Independent Planning Implicit Representation of the search space (why not
explicit representation?) Impossible in most cases, because of huge state space
Example: An initial state: s0 A set of actions: A A set of requirements of a goal state: G
![Page 6: Trade off between Exploration and Exploitation in Satisficing Planning](https://reader035.fdocuments.net/reader035/viewer/2022062812/56816320550346895dd39adb/html5/thumbnails/6.jpg)
OutlineWhat is Satisficing PlanningHeuristic Search in PlanningWhy we need exploration?Analysis of ArvandArvand-LTS: Arvand with Local MCTSExperiments
![Page 7: Trade off between Exploration and Exploitation in Satisficing Planning](https://reader035.fdocuments.net/reader035/viewer/2022062812/56816320550346895dd39adb/html5/thumbnails/7.jpg)
Some BackgroundWhat is a Heuristic?
Here, tell you how close this node to objects
Greedy Best-First Search:When expanding node n, take each successor n' and
place it on one list ordered by h(n’)
Hill Climbing Search:check neighbor nodes of current node, select the node
has lower h-value than current node. (if many, the lowest)
Terminates when no neighbor node has lower h-value
![Page 8: Trade off between Exploration and Exploitation in Satisficing Planning](https://reader035.fdocuments.net/reader035/viewer/2022062812/56816320550346895dd39adb/html5/thumbnails/8.jpg)
Heuristic Search As PlanningFF Planner
Hill climbingFF heuristic: not admissbleEnforced Hill climbing: more exploration in hill
climbing to escape from local mimima
LAMA PlannerGreedy Best-First Search (WA*)Mixed heuristic: FF+Landmark
![Page 9: Trade off between Exploration and Exploitation in Satisficing Planning](https://reader035.fdocuments.net/reader035/viewer/2022062812/56816320550346895dd39adb/html5/thumbnails/9.jpg)
OutlineWhat is Satisficing PlanningHeuristic Search in PlanningWhy we need exploration?Analysis of ArvandArvand-LTS: Arvand with Local MCTSExperiments
![Page 10: Trade off between Exploration and Exploitation in Satisficing Planning](https://reader035.fdocuments.net/reader035/viewer/2022062812/56816320550346895dd39adb/html5/thumbnails/10.jpg)
Why we need exploration?Best First Search and Hill Climbing, mostly do
greedy exploitation.
Problem: Local Minima and Plateaus
![Page 11: Trade off between Exploration and Exploitation in Satisficing Planning](https://reader035.fdocuments.net/reader035/viewer/2022062812/56816320550346895dd39adb/html5/thumbnails/11.jpg)
Local Minima and PlateausLocal minima: local best h-value
Plateaus: an area all nodes have the same h-value
![Page 12: Trade off between Exploration and Exploitation in Satisficing Planning](https://reader035.fdocuments.net/reader035/viewer/2022062812/56816320550346895dd39adb/html5/thumbnails/12.jpg)
More Exploration
Current algorithms or planners directly address the tradeoff between exploration and exploitation:RRT(not for satisficing planning) Identidem (stochastic hill climbing)Diverse best-first search (not published yet)Arvand (Monte-Carlo random walk)
![Page 13: Trade off between Exploration and Exploitation in Satisficing Planning](https://reader035.fdocuments.net/reader035/viewer/2022062812/56816320550346895dd39adb/html5/thumbnails/13.jpg)
Rapidly-Exploring Random Tree(RRT)RRT gradually builds a tree in the search space
until a path to the goal state is found. At each step the tree is either expanded towards the goal, which corresponds to exploitation, or towards a randomly selected point in the search space for exploration
![Page 14: Trade off between Exploration and Exploitation in Satisficing Planning](https://reader035.fdocuments.net/reader035/viewer/2022062812/56816320550346895dd39adb/html5/thumbnails/14.jpg)
RRT example
![Page 15: Trade off between Exploration and Exploitation in Satisficing Planning](https://reader035.fdocuments.net/reader035/viewer/2022062812/56816320550346895dd39adb/html5/thumbnails/15.jpg)
RRT example
![Page 16: Trade off between Exploration and Exploitation in Satisficing Planning](https://reader035.fdocuments.net/reader035/viewer/2022062812/56816320550346895dd39adb/html5/thumbnails/16.jpg)
RRT example
![Page 17: Trade off between Exploration and Exploitation in Satisficing Planning](https://reader035.fdocuments.net/reader035/viewer/2022062812/56816320550346895dd39adb/html5/thumbnails/17.jpg)
RRT example
![Page 18: Trade off between Exploration and Exploitation in Satisficing Planning](https://reader035.fdocuments.net/reader035/viewer/2022062812/56816320550346895dd39adb/html5/thumbnails/18.jpg)
RRTRRT requires complete model of the environment
to generate random points for exploration.However, current planning domains mostly
provide implicit representation of the search space. Random points might be invalid. (one possible way
to do is assume it is valid)Distribution of random points is not uniformed.
![Page 19: Trade off between Exploration and Exploitation in Satisficing Planning](https://reader035.fdocuments.net/reader035/viewer/2022062812/56816320550346895dd39adb/html5/thumbnails/19.jpg)
IdentidemColes and Smith’s Identidem introduces
exploration by stochastic local search (SLS). Algorithm:
Local searchaction sequences chosen probabilistically from the
set of all possible actions in each stateevaluates the FF heuristic after each action and
immediately jumps to the first state that improves on the start state
![Page 20: Trade off between Exploration and Exploitation in Satisficing Planning](https://reader035.fdocuments.net/reader035/viewer/2022062812/56816320550346895dd39adb/html5/thumbnails/20.jpg)
Diverse best-first search (DBFS)diversify search directions by probabilistically
selecting a node that does not have the best heuristic estimate ( not published yet)
DBFS GBFS KBFS
# Solved(16
12)
1451(161)
1209(403)
1288(324)
![Page 21: Trade off between Exploration and Exploitation in Satisficing Planning](https://reader035.fdocuments.net/reader035/viewer/2022062812/56816320550346895dd39adb/html5/thumbnails/21.jpg)
ArvandExploration using random walks helps to overcome the problem of local minima and plateaus. Jumping greedily exploits the knowledge gained by the random walks.Diff with Identidem: only the end-states of random walks are evaluated
![Page 22: Trade off between Exploration and Exploitation in Satisficing Planning](https://reader035.fdocuments.net/reader035/viewer/2022062812/56816320550346895dd39adb/html5/thumbnails/22.jpg)
OutlineWhat is Satisficing PlanningHeuristic Search in PlanningWhy we need exploration?Analysis of ArvandArvand-LTS: Arvand with Local MCTSExperiments
![Page 23: Trade off between Exploration and Exploitation in Satisficing Planning](https://reader035.fdocuments.net/reader035/viewer/2022062812/56816320550346895dd39adb/html5/thumbnails/23.jpg)
Analysis of ArvandFast Exploration:
Exploration using random walksOnly end-states evaluated makes faster exploration
(computing heuristic value takes 90% of time)
Greedy Exploitation: Jump to the best obtained node
![Page 24: Trade off between Exploration and Exploitation in Satisficing Planning](https://reader035.fdocuments.net/reader035/viewer/2022062812/56816320550346895dd39adb/html5/thumbnails/24.jpg)
Advantages of Arvandescape from local minima and plateaus and
quickly
![Page 25: Trade off between Exploration and Exploitation in Satisficing Planning](https://reader035.fdocuments.net/reader035/viewer/2022062812/56816320550346895dd39adb/html5/thumbnails/25.jpg)
Coverage of Arvand(current ipc problems not hard enough)
Arvand LAMA FF Fast Downward
# Solved(17
82)
1641(92%)
1581(89%)
1389(78%)
1374(77%)
![Page 26: Trade off between Exploration and Exploitation in Satisficing Planning](https://reader035.fdocuments.net/reader035/viewer/2022062812/56816320550346895dd39adb/html5/thumbnails/26.jpg)
Still some problemProblem:
Waste a lot of knowledgeSometimes a lot of duplications
![Page 27: Trade off between Exploration and Exploitation in Satisficing Planning](https://reader035.fdocuments.net/reader035/viewer/2022062812/56816320550346895dd39adb/html5/thumbnails/27.jpg)
OutlineWhat is Satisficing PlanningHeuristic Search in PlanningWhy we need exploration?Analysis of ArvandArvand-LTS: Arvand with Local MCTSExperiments
![Page 28: Trade off between Exploration and Exploitation in Satisficing Planning](https://reader035.fdocuments.net/reader035/viewer/2022062812/56816320550346895dd39adb/html5/thumbnails/28.jpg)
Arvand-LTS: Arvand with Local MCTSMotivation:
Use more knowledge we get from random walks?Selectively growing a search tree while running
random walks
![Page 29: Trade off between Exploration and Exploitation in Satisficing Planning](https://reader035.fdocuments.net/reader035/viewer/2022062812/56816320550346895dd39adb/html5/thumbnails/29.jpg)
Monte-Carlo Random Walk-based Local Tree Search (MRW-LTS)
![Page 30: Trade off between Exploration and Exploitation in Satisficing Planning](https://reader035.fdocuments.net/reader035/viewer/2022062812/56816320550346895dd39adb/html5/thumbnails/30.jpg)
Framework of MCTS
![Page 31: Trade off between Exploration and Exploitation in Satisficing Planning](https://reader035.fdocuments.net/reader035/viewer/2022062812/56816320550346895dd39adb/html5/thumbnails/31.jpg)
MRW-LTSEvery local search build a local search treeRandom walks are required starting from leaf
nodes of the search tree.Nodes in tree store the minimum h-value obtained
by random walks starting from their subtrees (not node h-value)
It selects a leaf node by following an ε-greedy strategy in each node.
![Page 32: Trade off between Exploration and Exploitation in Satisficing Planning](https://reader035.fdocuments.net/reader035/viewer/2022062812/56816320550346895dd39adb/html5/thumbnails/32.jpg)
Some Change
![Page 33: Trade off between Exploration and Exploitation in Satisficing Planning](https://reader035.fdocuments.net/reader035/viewer/2022062812/56816320550346895dd39adb/html5/thumbnails/33.jpg)
OutlineWhat is Satisficing PlanningHeuristic Search in PlanningWhy we need exploration?Analysis of ArvandArvand-LTS: Arvand with Local MCTSExperiments
![Page 34: Trade off between Exploration and Exploitation in Satisficing Planning](https://reader035.fdocuments.net/reader035/viewer/2022062812/56816320550346895dd39adb/html5/thumbnails/34.jpg)
Experiments1, IPC-2008 2, big search spaces
![Page 35: Trade off between Exploration and Exploitation in Satisficing Planning](https://reader035.fdocuments.net/reader035/viewer/2022062812/56816320550346895dd39adb/html5/thumbnails/35.jpg)
Coverage on IPC-6Domains LAMA Arvand Arvand-LTS
Cyber 100% 100% 100%Elevator 87% 100% 100%Openstacks 100% 100% 100%Parcprinter 77% 100% 100%Pegsols 100% 100% 100%Scanalyzer 100% 90% 90%Transport 100% 100% 100%Woodworking 100% 100% 100%Total 96% 99% 99%
![Page 36: Trade off between Exploration and Exploitation in Satisficing Planning](https://reader035.fdocuments.net/reader035/viewer/2022062812/56816320550346895dd39adb/html5/thumbnails/36.jpg)
Coverage
![Page 37: Trade off between Exploration and Exploitation in Satisficing Planning](https://reader035.fdocuments.net/reader035/viewer/2022062812/56816320550346895dd39adb/html5/thumbnails/37.jpg)
Summary1, exploration is important in satisficing planning2, A good balancing between exploration and
exploitation might make a big difference!