Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II:...
Transcript of Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II:...
![Page 1: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/1.jpg)
Probabilistic Temporal Planning
PART I: The ProblemMausam
David E. SmithSylvie Thiébaux
![Page 2: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/2.jpg)
Drive (-1)Dig(5)Visual servo (.2, -.15) NIR
K9
Motivation
![Page 3: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/3.jpg)
window
[10 ,14:30]
power power
Drive (-1)Dig(5)Visual servo (.2, -.15) NIRX X XX?
Discrete failuresTracking failureInstrument placement failureHardware faults and failures
Time & EnergyWheel slippageObstacle avoidanceFeature tracking
Reality Bites
![Page 4: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/4.jpg)
Replanning processing powersafetylost opportunitiesdead ends
Improving robustnessConservatism wastefulFlexibility useful but limitedConformance difficult & limitedConditionality very difficult
Alternative Approaches
![Page 5: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/5.jpg)
Technical Challenges
Durative actions
Concurrency
Continuous resources
Time constraints and resource bounds
OversubscriptionG1, G2, G3, G4, …V1, V2, V3, V4, …
Energy Storage
Visual servo (.2, -.15)
Warmup NIR
Lo res Rock finder NIR
Comm.
[10 ,14:30]
NIR
![Page 6: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/6.jpg)
Problem Dimensions
What actionnext?
Percepts Actions
Environment
Static vs. Dynamic
Full vs. Partial satisfaction
Fully vs.
Partially Observable
Perfectvs.
Noisy
Deterministic vs.
Stochastic
Instantaneous vs.
Durative
Sequentialvs.
Concurrent
Discrete vs.
ContinuousOutcomes
Predictable vs. Unpredictable
![Page 7: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/7.jpg)
World:Static ✔
Actions:Durative ✔
Concurrency ✔
Stochastic ✔
Discrete Outcomes ✖
Complete model ✔
Percepts:Fully observable ✔
Perfect ✔
Free ✔
Objective:Goals ✖
Assumptions
![Page 8: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/8.jpg)
Probabilistic POCL Approaches
PPDDL-like model of actionno concurrencyno timeno resources
Discrete action outcomes
C-BuridanDTPOPMahinurProbapop
Fixablebut: lack good heuristic guidance no guarantees of optimality
A
O1: p1, p2, ...
O2: q1, q2, ...
O3: r1, r2, ...
.7
.2
.1
O4: s1, s2, ...
O5: t1, t2, ...
.4
.6
c 1, c 2,
...
d1 , d
2 , ...
![Page 9: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/9.jpg)
1. Introduction2. Basics of probabilistic planning (Mausam)
3. Durative actions w/o concurrency (Mausam)
4. Concurrency w/o durative actions (Sylvie)
5. Durative actions w/concurrency (Sylvie)
6. Practical considerations
Outline
![Page 10: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/10.jpg)
ReferencesBresina, J.; Dearden, R.; Meuleau, N.; Ramakrishnan, S.; Smith, D.; and Washington,R. Planning under continuous time and resource uncertainty: A challenge for AI.UAI-02.Draper, D.; Hanks, S.; and Weld, D. Probabilistic planning with informationgathering and contingent execution. AIPS-94.
Onder, N., and Pollack, M. Conditional, probabilistic planning: A unifying algorithmand effective search control mechanisms. AAAI-99.Onder, N.; Whelan, G. C.; and Li, L. Engineering a conformant probabilistic planner.JAIR 25.
Peot, M. Decision-Theoretic Planning. Ph.D. Dissertation, Dept of EngineeringEconomic Systems, Stanford University, 1998.
![Page 11: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/11.jpg)
Probabilistic Temporal Planning
PART II: Introduction to Probabilistic Planning Algorithms
Mausam
David E. Smith
Sylvie Thiébaux
![Page 12: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/12.jpg)
Planning
What action
next?
Percepts Actions
Environment
Static vs. Dynamic
Full vs. Partial satisfaction
Fully vs.
Partially Observable
Perfectvs.
Noisy
Deterministic vs.
Stochastic
Instantaneous vs.
Durative
Sequentialvs.
Concurrent
Discrete vs.
ContinuousOutcomes
Predictable vs. Unpredictable
![Page 13: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/13.jpg)
Classical Planning
What action
next?
Percepts Actions
Environment
Static
Full
Fully Observable
Perfect
Predictable
Instantaneous
Sequential
Discrete
Deterministic
![Page 14: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/14.jpg)
Stochastic Planning
What action
next?
Percepts Actions
Environment
Static
Full
Fully Observable
Perfect
Stochastic
Instantaneous
Sequential
Unpredictable
Discrete
![Page 15: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/15.jpg)
Markov Decision Process (MDP)
S: A set of states
A: A set of actions
P transition model
C cost model
G: set of goals
s0: start state
: discount factor
R( reward model
factoredFactored MDP
C(a) / C(s,a)
R(s) / R(s,a)
absorbing/non-absorbing
![Page 16: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/16.jpg)
Objective of a Fully Observable MDP
Find a policy : S A
which optimises
minimises expected cost to reach a goal
maximises expected reward
maximises expected (reward-cost)
given a ____ horizon
finite
infinite
indefinite
assuming full observability
discountedor
undiscount.
![Page 17: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/17.jpg)
Role of Discount Factor ( )
Keep the total reward/total cost finite
useful for infinite horizon problems
sometimes indefinite horizon: if there are deadends
Intuition (economics):
Money today is worth more than money tomorrow.
Total reward: r1 + r2 + 2r3
Total cost: c1 + c2 + 2c3
![Page 18: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/18.jpg)
Examples of MDPs
Goal-directed, Indefinite Horizon, Cost Minimisation MDP
<S, A, Pr, C, G, s0>
Most often studied in planning community
Infinite Horizon, Discounted Reward Maximisation MDP
<S, A, Pr, R, >
Most often studied in reinforcement learning
Goal-directed, Finite Horizon, Prob. Maximisation MDP
<S, A, Pr, G, s0, T>
Also studied in planning community
Oversubscription Planning: Non absorbing goals, Reward Max. MDP
<S, A, Pr, G, R, s0>
Relatively recent model
![Page 19: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/19.jpg)
<S, A, Pr, C, G, s0>
Define J*(s) {optimal cost} as the minimum expected cost to reach a goal from this state.
J* should satisfy the following equation:
Bellman Equations for MDP1
![Page 20: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/20.jpg)
<S, A, Pr, R, s0, >
Define V*(s) {optimal value} as the maximumexpected discounted reward from this state.
V* should satisfy the following equation:
Bellman Equations for MDP2
![Page 21: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/21.jpg)
<S, A, Pr, G, s0, T>
Define J*(s,t) {optimal cost} as the minimum expected cost to reach a goal from this state at tth
timestep.
J* should satisfy the following equation:
Bellman Equations for MDP3
![Page 22: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/22.jpg)
Given an estimate of J* function (say Jn)
Backup Jn function at state s
calculate a new estimate (Jn+1) :
Qn+1(s,a) : value/cost of the strategy:
execute action a in s, execute n subsequently
n = argmina Ap(s)Qn(s,a)
Bellman Backup
![Page 23: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/23.jpg)
Bellman Backup
J0= 0
J0= 1
J0= 2
Q1(s,a1) = 2 + 0Q1(s,a2) = 20 + 0.9£ 1
+ 0.1£ 2
Q1(s,a3) = 4 + 2
min
J1= 2
agreedy = a1
20a2
a1
a3
s0
s1
s2
s3
?
![Page 24: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/24.jpg)
Value iteration
assign an arbitrary assignment of J0 to each state.
repeat
for all states s
compute Jn+1(s) by Bellman backup at s.
until maxs |Jn+1(s) Jn(s)| <
Iteration n+1
Residual(s)
-convergence
![Page 25: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/25.jpg)
Comments
Decision-theoretic AlgorithmDynamic Programming Fixed Point Computation
Probabilistic version of Bellman-Ford Algorithmfor shortest path computationMDP1 : Stochastic Shortest Path Problem
Jn J* in the limit as n 1
-convergence : Jn function is within of J*works only when no state is a dead-end (J* is finite)
MonotonicityJ0 p J* Jn p J* (Jn monotonic from below)J0 p J* Jn p J* (Jn monotonic from above)
otherwise Jn non-monotonic
![Page 26: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/26.jpg)
Policy Computation
Optimal policy is stationary and time-independent.
for infinite/indefinite horizon problems
Policy Evaluation
A system of linear equations in |S| variables.
![Page 27: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/27.jpg)
Changing the Search Space
Value Iteration
Search in value space
Compute the resulting policy
Policy Iteration
Search in policy space
Compute the resulting value
![Page 28: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/28.jpg)
Policy iteration
assign an arbitrary assignment of 0 to each state.
repeat
compute Jn+1 the evaluation of n
for all states s
compute n+1(s): argmina2 Ap(s)Qn+1(s,a)
until n+1 n
Advantage
searching in a finite (policy) space as opposed to uncountably infinite (value) space convergence faster.
all other properties follow!
costly: O(n3)
approximateby value iteration using fixed policy
Modified Policy Iteration
![Page 29: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/29.jpg)
Connection with Heuristic Search
s0
G
s0
G
? ?s0
G
? ?
regular graph acyclic AND/OR graph cyclic AND/OR graph
![Page 30: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/30.jpg)
Connection with Heuristic Search
s0
G
s0
G
? ?s0
G
? ?
regular graph
soln:(shortest) path
A*
acyclic AND/OR graph
soln:(expected shortest)
acyclic graph
AO*
cyclic AND/OR graph
soln:(expected shortest)
cyclic graph
LAO*
All algorithms able to make effective use of reachability information!
![Page 31: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/31.jpg)
LAO*
1. add s0 in the fringe and in greedy graph
2. repeat
expand a state on the fringe (in greedy graph)
initialize all new states by their heuristic value
perform value iteration for all expanded states
recompute the greedy graph
3. until greedy graph is free of fringe states
4. output the greedy graph as the final policy
![Page 32: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/32.jpg)
LAO* [Iteration 1]
s0
G
? ?s0
add s0 in the fringe and in greedy graph
![Page 33: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/33.jpg)
LAO* [Iteration 1]
s0
G
? ?s0
expand a state on fringe in greedy graph
? ?
![Page 34: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/34.jpg)
LAO* [Iteration 1]
s0
G
? ?s0
initialise all new states by their
heuristic values
perform VI on expanded states
? ?
h h h h
J1
![Page 35: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/35.jpg)
LAO* [Iteration 1]
s0
G
? ?s0
recompute the greedy graph
? ?
h h h h
J1
![Page 36: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/36.jpg)
LAO* [Iteration 2]
s0
G
? ?s0
expand a state on the fringe
initialise new states
? ?
h h h h
J1
h h
![Page 37: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/37.jpg)
LAO* [Iteration 2]
s0
G
? ?s0
perform VI
compute greedy policy
? ?
h h h
J2
h h
J2
![Page 38: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/38.jpg)
LAO* [Iteration 3]
s0
G
? ?s0
expand fringe state
? ?
h h
J2
h h
J2
G
![Page 39: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/39.jpg)
LAO* [Iteration 3]
s0
G
? ?s0
perform VI
recompute greedy graph
? ?
h h
J3
h h
J3
G
J3
![Page 40: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/40.jpg)
LAO* [Iteration 4]
s0
G
? ?s0
? ?
h
J4
h h
J4
G
J4
h
J4
![Page 41: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/41.jpg)
LAO* [Iteration 4]
s0
G
? ?s0
? ?
h
J4
h h
J4
G
J4
h
J4
Stops when all nodes in greedy graph have been expanded
![Page 42: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/42.jpg)
Comments
Dynamic Programming + Heuristic Search
admissible heuristic optimal policy
expands only part of the reachable state space
outputs a partial policyone that is closed w.r.t. to Pr and s0
Speedups
expand all states in fringe at once
perform policy iteration instead of value iteration
perform partial value/policy iteration
weighted heuristic: f = (1-w).g + w.h
ADD based symbolic techniques (symbolic LAO*)
![Page 43: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/43.jpg)
Real Time Dynamic Programming
Trial: simulate greedy policy starting from start state;
perform Bellman backup on visited states
RTDP: repeat Trials until cost function converges
![Page 44: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/44.jpg)
Min
?
?s0
Jn
Jn
Jn
Jn
Jn
Jn
Jn
Qn+1(s0,a)
Jn+1(s0)
agreedy = a2
Goala1
a2
a3
RTDP Trial
?
![Page 45: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/45.jpg)
Comments
Properties
if all states are visited infinitely often then Jn J*
Advantages
Anytime: more probable states explored quickly
Disadvantages
complete convergence is slow!
no termination condition
![Page 46: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/46.jpg)
Labeled RTDP
Initialise J0 with an admissible heuristic
Jn monotonically increases
Label a state as solved
if the Jn for that state has converged
Stop trials when they reach any solved state
Terminate with s0 is solved
s Gbest action
)
s G?
t
both s and tget solved together
![Page 47: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/47.jpg)
Properties
admissible J0 optimal J*
heuristic-guided
explores a subset of reachable state space
anytime
focusses attention on more probable states
fast convergence
focusses attention on unconverged states
terminates in finite time
![Page 48: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/48.jpg)
Recent Advances: Bounded RTDP
Associate with each state
Lower bound (lb): for simulation
Upper bound (ub): for policy computation
gap(s) = ub(s) lb(s)
Terminate trial when gap(s) <
Bias sampling towards unconverged states proportional to P
Perform backups in reverse order for current trajectory.
![Page 49: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/49.jpg)
Recent Advances: Focused RTDP
Similar to Bounded RTDP except
a more sophisticated definition of priority that combines gap and prob. of reaching the state
adaptively increasing the max-trial length
Recent Advances: Learning DFS
Iterative Deepening A* equivalent for MDPs
Find strongly connected components to check for a state being solved.
![Page 50: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/50.jpg)
Other Advances
Ordering the Bellman backups to maximise information flow.
Partition the state space and combine value iterations from different partitions.
External memory version of value iteration
![Page 51: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/51.jpg)
Policy Gradient Approaches
direct policy search
parameterised policy Pr(a|ss,w)
no value function
flexible memory requirements
policy gradient
J(ww)=Ew[ t=0..1tct]
gradient descent (wrt ww)
reaches a local optimum
continuous/discrete spaces
parameterisedpolicy Pr(a|s.w)
parameters w
state s action a
Pr(a=a1|s,w)
Pr(a=a2|s,w)
Pr(a=ak|s,w)
![Page 52: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/52.jpg)
Policy Gradient Algorithm
J(ww)=Eww[ t=0..1tct
minimise J by
computing gradient
stepping the parameters away wwt+1 = wwt rrJ(ww)
until convergence
Gradient Estimate
Monte Carlo estimate from trace s1, a1, c1 T, aT, CT
eet+1 = eet + rrww log Pr(at+1|st,wwt)
wwt+1 = wwt - tcteet+1
![Page 53: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/53.jpg)
Policy Gradient Approaches
often used in reinforcement learning
partial observability
model free (PPr(oo|s) are unknown)
to learn a policy from observations and costs
Reinforcement Learner
Pr(a|o,w)
Pr(a=a1|o,w)
Pr(a=a2|o,w)
Pr(a=ak|o,w)
world/simulator
PPr(o|s)
observation ocost c
action a
![Page 54: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/54.jpg)
LP Formulation of MDPs
s S (s)J*(s)
under constraints
for s 2 G: J*(s) = 0
for every s, a:
2S P C
(s) > 0
![Page 55: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/55.jpg)
Modeling Complex Problems
Modeling time
continuous variable in the state space
discretisation issues
large state space
Modeling concurrency
many actions may execute at once
large action space
Modeling time and concurrency
large state and action space!!
J(s)
J(s)
t
t
![Page 56: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/56.jpg)
References
Simple statistical gradient following algorithms for connectionist reinforcement learning. R. J. Williams. Machine Learning, 1992.
Learning to Act using Real-Time Dynamic Programming. Andrew G. Barto, Steven J. Bradtke, Satinder P. Singh. Artificial Intelligence, 1995.
Policy Gradient Methods for Reinforcement Learning with Function Approximation. Richard S. Sutton, David A. McAllester, Satinder P. Singh, Yishay Mansour. NIPS 1999.
Infinite-Horizon Policy-Gradient Estimation. Jonathan Baxter and Peter L. Bartlett. JAIR 2001.
LAO*: A Heuristic Search Algorithm that Finds Solutions with Loops. E.A. Hansen and S. Zilberstein. Artificial Intelligence, 2001.
Labeled RTDP: Improving the Convergence of Real-Time Dynamic Programming. Blai Bonet and Héctor Geffner. ICAPS 2003.
![Page 57: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/57.jpg)
References
Bounded Real-Time Dynamic Programming: RTDP with monotone upper bounds and performance guarantees. H. Brendan McMahan, Maxim Likhachev, and Geoffrey Gordon. ICML 2005.
Learning Depth-First Search: A Unified Approach to Heuristic Search in Deterministic and Non-Deterministic Settings, and its application to MDPs. Blai Bonet and Héctor Geffner. ICAPS 2006.
Focused Real-Time Dynamic Programming for MDPs: Squeezing More Out of a Heuristic, Trey Smith and Reid Simmons. AAAI 2006.
Prioritization Methods for Accelerating MDP Solvers. David Wingate, Kevin Seppi. JMLR 2005.
Topological Value Iteration Algorithm for Markov Decision Processes. Peng Dai, Judy Goldsmith. IJCAI 2007.
Prioritizing Bellman Backups Without a Priority Queue. Peng Dai and Eric Hansen. ICAPS 2007.
External Memory Value Iteration. Stefan Edelkamp, Shahid Jabbar and Blai Bonet. ICAPS 2007.
![Page 58: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/58.jpg)
Probabilistic Temporal Planning
PART III: Durative Actions without Concurrency
Mausam
David E. Smith
Sylvie Thiébaux
![Page 59: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/59.jpg)
Stochastic Planning w/ Durative Actions
What action
next?
Percepts Actions
Environment
Static
Full
Fully Observable
Perfect
Stochastic
Durative
Sequential
Unpredictable
Discrete/Continuous
![Page 60: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/60.jpg)
Motivation
Why are durative actions important?
Race against time: deadlines
Increase reward (single goal): time dependent reward
Increase reward (many non-absorbing goals)oversubscription Planning
achieve as many goals as possible in the given time
Why is uncertainty important?
durations could be uncertain
we may decide the next action based on the time taken by the previous ones.
![Page 61: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/61.jpg)
Different Related Models
MDP < SMDPTMDP < HMDP
undiscounted deadline problems.
discounting w/ action durations
MDPno explicit action durations
Semi-MDPcontinuous/discrete action durationsdiscounted/undiscounted
Time-dependent MDPdiscrete MDP + one continuous variable timeundiscounted
Continuous MDPMDP with only continuous variables
Hybrid MDPMDP with many discrete and continuous variables
![Page 62: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/62.jpg)
Undiscounted/Discrete-time/No-deadline
Embed the duration information in C or RMinimise make-spaninitialise C by its duration
but the duration may be probabilistic
![Page 63: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/63.jpg)
Discounted/Discrete-time/No-deadline
A single
Semi-MDP
![Page 64: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/64.jpg)
V* depends on
current state
current time
Undiscounted/Discrete-time/Deadline
Time-dependent MDP
reward
time
![Page 65: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/65.jpg)
Summation is now integral!
Undiscounted/Continuous-time/Deadline
![Page 66: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/66.jpg)
Discounted/Continuous-time/No-deadline
convolutions
![Page 67: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/67.jpg)
Algorithms
All previous algorithms extend
with new Bellman update rules
e.g. value iteration, policy iteration, linear prog.
Computational/representational challenges
efficient represent of continuous value functions
efficient computation of convolutions
Algorithm extensions
reachability analysis in continuous space?
![Page 68: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/68.jpg)
Representation of Continuous Functions
flat discretisation
costly!
piecewise constant
models deadline problems
piecewise linear
models minimise make-span problems
phase type distributions
approximates arbitrary probability density functions
piecewise gamma function
![Page 69: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/69.jpg)
value
time
50
value
time
50
80
Convolution
Engine
probability
80 duration
Convolutions
![Page 70: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/70.jpg)
Result of convolutions
discrete constant linear
discrete discrete constant linear
constant constant linear quadratic
linear linear quadratic cubic
value function
prob
abili
ty d
ensi
ty f
unct
ion
![Page 71: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/71.jpg)
discrete-discrete
constant-discrete
constant-constant
Convolutions
![Page 72: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/72.jpg)
Analytical solution to convolutions
probability function approximated one time
as phase-type distribution p(N)= ee-- NN
value function is piecewise gamma
convolutions can be computed analytically!
![Page 73: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/73.jpg)
Hybrid AO*
search in discrete state space.
associate piecewise constant value functions with each discrete node.
employ sophisticated continuous reachability.
![Page 74: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/74.jpg)
TakePic(R1)
0.75 0.25
Navigate(Start, R1)
Q
H
V
V
Q Q
V
$10
: convolution
: max
Hybrid AO*
![Page 75: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/75.jpg)
0.75 0.25
Navigate(Start, R1)
0.25 0.75
Navigate(Start, R2)
V R1
R2
Q Q
Hybrid AO*
many greedy successors
![Page 76: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/76.jpg)
0.75 0.25
Navigate(Start, R1)
0.25 0.75
Navigate(Start, R2)
V R1
R2
Q
Hybrid AO*
P
P P
Q
convolve value functions (backward)
convolve probability functions (forward)
![Page 77: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/77.jpg)
References
Markov Decision Processes: Discrete Stochastic Dynamic Programming. Martin Puterman. John Wiley and Sons 1994.
Dynamic Programming and Optimal Control. Dmitri Bertsekas. Athena Scientific 1995.
Exact Solutions to Time-Dependent MDPs. Justin Boyan and Micheal Littman. NIPS 2000.
Dynamic Programming for Structured Continuous Markov Decision Problems. Zhengzhu Feng, Richard Dearden, Nicolas Meuleau, and Richard Washington. UAI 2004.
Lazy approximation for solving continuous finite-horizon MDPs. Lihong Li and Michael L. Littman. AAAI 2005.
Planning with Continuous Resources in Stochastic Domains. Mausam, Emmanuelle Benazara, Ronen Brafman, Nicolas Meuleau, Eric Hansen. IJCAI 2005.
A Fast Analytical Algorithm for Solving Markov Decision Processes with Real-Valued Resources. J. Marecki, Sven Koenig and Milind Tambe. IJCAI 2007.
![Page 78: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/78.jpg)
Probabilistic Temporal Planning
PART IV: Concurrency w/o Durative Actions
Mausam, David E. Smith, Sylvie Thiebaux
![Page 79: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/79.jpg)
Stochastic Planning
![Page 80: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/80.jpg)
Plan for Part IV
Concurrent MDP (CoMDP) ModelValue-Based AlgorithmsPlanning Graph ApproachesPolicy Gradient ApproachesRelated Models
![Page 81: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/81.jpg)
Concurrent MDPs (CoMDPs)
formally introduced by Mausam & Weld [AAAI-04]MDP that allows simultaneous execution of action sets�= semi-MDPs where time is explicit but concurrency is lackingcost of an action set accounts for time and resources
notion of concurrency (mutex), generalising independence(deterministic actions a and b are independent iff a; b ≡ b; a):
restrictive: all executions of the actions are independentpermissive: some execution is independent; requires failure states
![Page 82: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/82.jpg)
Concrete Independence Example
Probabilistic STRIPS:each action has a set of preconditions and a probabilitydistribution over a set of outcomes
each outcome has sets of positive and negative effectsan outcome set is consistent when no outcome deletes a positiveeffect or the precondition of another(’s action)
a set of actions is independent when:restrictive: all joint outcomes of the actions are consistent
permissive: at least one joint outcome is consistent
![Page 83: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/83.jpg)
Concurrent MDPs (CoMDPs)
MDP equivalent to a CoMDPA CoMDP 〈S,A,Pr ,C,G,s0〉 translates into the MDP 〈S,A||,Pr||,C||,G,s0〉:
A||(s): mutex-free subsets of actions A = {a1, . . . , ak} ⊆ A(s)
due to independencePr||(s
′ | s, A) =X
s1∈S
X
s2∈S. . .
X
sk∈SPr(s1 | s, a1)Pr(s2 | s1, a2) . . .Pr(s′ | sk−1, ak )
C||(A) =kX
i=1
res(ai) +k
maxi=1
dur(ai)
![Page 84: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/84.jpg)
Plan for Part IV
Concurrent MDP (CoMDP) ModelValue-Based AlgorithmsPlanning Graph ApproachesPolicy Gradient ApproachesRelated Models
![Page 85: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/85.jpg)
Value-Based Algorithms
compute a proper optimal policy for the CoMDPdynamic programming, e.g., RTDP applies:
J||n(s) = minA∈A||(s)
Q||n(s, A)
need to mitigate the exponential blowup in A||
1. pruning Bellman backups2. sampling Bellman backups
s
a1,a2,a3
a1,a2
a1
a2
a3
a1,a3
a2,a3
![Page 86: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/86.jpg)
Pruning Bellman Backups
Theorem (Mausam & Weld AAAI-04)Let Un be an upper bound on J||n(s). If
Un <k
maxi=1
Q||n(s, {ai}) + C||(A) −k∑
i=1
C||({ai})
then combination A is not optimal for state s in this iteration.
Combo-skipping pruning rule:1. compute Q||n(s, {a}) for all applicable single actions2. set Un ← Q||n(s, A∗
n−1), using the optimal combination A∗n−1 at
the previous iteration3. apply the theorem
![Page 87: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/87.jpg)
Pruning Bellman Backups
Theorem (Bertsekas (1995))Let L be a lower bound on Q∗
||(s, A) and U be an upper boundon J∗
||(s). If L > U then A is not optimal for s.
Combo-elimination pruning rule:1. initialise RTDP estimates with an admissible heuristic;
Q||n(s, A) remain lower bounds2. set U to the optimal cost of the serial MDP3. apply the theorem
combo skipping: cheap but short-term benefits (try it first)combo elimination: expensive but pruning is definitive
![Page 88: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/88.jpg)
Sampling Bellman Backups
Backup random combinationsbias towards action sets with previously best Q-valuesbias towards action sets built from best individual actions
Loss of optimality; J||n(s) might not monotonically increasedo full backup when convergence is asserted for a stateuse (scaled down) result as heuristic to pruned RTDP
![Page 89: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/89.jpg)
Plan for Part IV
Concurrent MDP (CoMDP) ModelValue-Based AlgorithmsPlanning Graph ApproachesPolicy Gradient ApproachesRelated Models
![Page 90: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/90.jpg)
Planning Graph Approaches
Motivated by the need to compress the state space
The planning graph data structure facilitates this by:exploiting a probabilistic STRIPS representationusing problem relaxations to find cost lower boundsenabling goal-regression
![Page 91: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/91.jpg)
History
Graphplan [Blum & First IJCAI-95]
classical, concurrent, optimaluses the graph as a heuristic and for goal regression search
TGraphplan [Blum & Langford ECP-99]
replanner, concurrent, non-optimalreturns the most-likely trajectory to the goal
PGraphplan [Blum & Langford ECP-99]
probabilistic contingent, non-concurrent, optimalprobabilistic graph yields a heuristic for DP
Paragraph [Little & Thiebaux ICAPS-06]
probabilistic contingent, concurrent, optimalextends the full Graphplan framework
![Page 92: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/92.jpg)
Paragraph
solves concurrent probabilistic STRIPS planning problemsfinds a concurrent contingency plan with smallest failureprobability within a time horizon
⇒ goal-directed, finite horizon, prob. maximisation CoMDPhas a cyclic version
![Page 93: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/93.jpg)
Paragraph
Builds the probabilistic planning graphuntil G ⊆ Pi and G is mutex-free
Attempts plan extractionuse goal regression search to find all trajectories thatGraphplan would findsome of those will link naturallyadditionally link other trajectories using forward simulation
Alternates graph expansion and plan extractionuntil the time horizon is exceeded or a plan of cost 0 isfound (or goal unreachability can be proven)
![Page 94: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/94.jpg)
Planning Graph (Probabilistic)
action, propositions, and outcome levels and mutexes
s
r
s
r
t
q
p
a1
nr
o1
osor
A1
ns
P0 P1
o2
100%
100%
80%
20%
O1
![Page 95: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/95.jpg)
Goal-Regression Search (Probabilistic)
{a1,a2}
{g1,g2,g3}
o11, o21
o12, o22
o12, o21 o11, o22
{s1,s2}
{s3}
{s4}
{s5,s6}
{g1,g6}
{g7}
s1
{g4,g1}
{a3,a2}
nodes: goal set, action sets, world states set, cost, (time)arcs: joint outcome, (world state for conditional arcs)requires extra linking via forward simulation
![Page 96: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/96.jpg)
Why do we need extra linking?
a2a1
o3 o4
p1 p2
pg p2
o2 o1
p1
npgopg
o3 o1
a2 a1
p1
a1 a2
p2
o1 o3o2 o4
t: 0 p1
p2
pg
t: 1 pg
t: 2
a1
p1pg
pg
a1
p1
a2
p2
{p1,p2}{p1,p2}
{p2,pg}
o2 o1 o3 o4
{p1,pg}{p2} {p1}
t: 1
t: 2
t: 0
a2
p2
{pg}o1
npgopg
o3
{p1,p2} {p1,p2}
I = {p1, p2}, G = {pg}optimal plan: execute one action; if it fails execute the other
![Page 97: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/97.jpg)
Plan Extraction
ends with forward simulation and backward cost updateeach node/world state pair yields a potential plan stepselect pairs and action sets with optimal cost
Cost (prob. failure) of a node/world state pairC(n, sn) =
0 if n is a goal nodemin
A∈act(n)
∑
O∈Out(A)
Pr(O) × minn′∈succ(n,O,sn)
C(n′, res(O, sn))
![Page 98: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/98.jpg)
Plan for Part IV
Concurrent MDP (CoMDP) ModelValue-Based AlgorithmsPlanning Graph ApproachesPolicy Gradient ApproachesRelated Models
![Page 99: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/99.jpg)
Policy Gradient ApproachesMinimise the expected cost of a parameterised policy bygradient descent in the parameters space.
world / simulator
Reinforcement
Learner
Pr(a|o,w)
Pr(s’|a,s)
Pr(o|s)
observations o
cost c
action a
Pr(a=a1 | o,w) = 0.5
Pr(a=a3 | o,w) = 0.4
Pr(a=a2 | o,w) = 0.1
![Page 100: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/100.jpg)
Factored policy gradientneed to mitigate the blowup caused by CoMDPsfactorise the CoMDP policy into individual action policies[Peshkin et. al UAI-00, Aberdeen & Buffet ICAPS-07]
world / simulator
observations o
cost c
not eligiblechoice disabled
Pr(a1=no | o1,w1) = 0.9
Pr(a3=yes | o2,w2) = 0.5
Pr(a3=no | o2,w2) = 0.5
a1
a3
a2
Pr(a2=no | o2,w2) = 1.0
Pr(a1=yes | o1,w1) = 0.1
action set APr(s’|A,s)
world stateeligible actions
![Page 101: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/101.jpg)
Factored policy gradient
Theorem (Peshkin et. al, UAI-00)For factored policies, factored policy gradient is equivalentto joint policy gradient.Every strict Nash equilibrium is a local optimum for policygradient in the space of parameters of a factored policy, butnot vice versa.
FPG planner [Aberdeen & Buffet, 2007]
did well in the probabilistic planning competitionhas a more efficient parallel versioncost function favors reaching the goal as soon as possibleindividual policies are linear networks with prob. function:
Pr(ai t = yes | ot , wi) =1
exp(o�t wi) + 1
![Page 102: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/102.jpg)
Plan for Part IV
Concurrent MDP (CoMDP) ModelValue-Based AlgorithmsPlanning Graph ApproachesPolicy Gradient ApproachesRelated Models
![Page 103: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/103.jpg)
Related Models
range of decentralised MDP models [Goldman & Zilberstein AIJ-04]
Composite MDPs [Singh & Cohn NIPS-97]
n component MDPs 〈Si ,Ai ,Pri ,Ri , s0i〉composite MDP 〈S,A,Pr ,R, s0〉 satisfies:
S =∏n
i=1 Si , s0 =∏n
i=1 s0 iA(s) ⊆ ∏n
i=1 Ai(s) (constraints on simultaneous actions)Pr(s′ | a, s) =
∏ni=1 Pri(s′
i | ai , si) (transition independence)R(s, a, s′) =
∑ni=1 Ri(s, a, s′) (additive utility independence)
useful for resource allocation [Meuleau et. al UAI-98]
opt. solutions to component MDPs yield bounds for pruningcomposite MDPs (as in combo-elimination) [Singh & Cohn NIPS-97]
composite value function can be approximated as a linearcombination of component value functions [Guestrin et. al NIPS-01]
![Page 104: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/104.jpg)
ReferencesHow to Dynamically Merge Markov Decision Processes, S. Singh andD. Cohn. NIPS-97.
Solving Very Large Weakly Coupled Markov Decision Processes, N.Meuleau, M. Hauskrecht, K.-E. Kim, L. Peshkin, L. Kaelbling, and T.Dean. UAI-98.
Learning to Cooperate via Policy Search, L. Peshkin, K.-E. Kim, N.Meuleau, L. Kaelbling. UAI-00.
Multi-Agent Planning with Factored MDPs, C. Guestrin, D Koller, and R.Parr. NIPS-01.
Decentralized Control of Cooperative Systems: Categorization andComplexity Analysis, C.V. Goldman and S. Zilberstein. JAIR, 2004.
Solving Concurrent Markov Decision Processes, Mausam and D. Weld.AAAI-04.
Concurrent Probabilistic Planning in the Graphplan Framework, I. Littleand S. Thiebaux. ICAPS-06.
Concurrent Probabilistic Temporal Planning with Policy-Gradients, D.Aberdeen and O. Buffet. ICAPS-07.
![Page 105: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/105.jpg)
Probabilistic Temporal Planning
PART V: Durative Actions w/ Concurrency
Mausam, David E. Smith, Sylvie Thiebaux
![Page 106: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/106.jpg)
Stochastic Planning
![Page 107: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/107.jpg)
Plan for Part V
Concurrent Probabilistic Temporal Planning (CPTP)CoMDP ModelValue-Based AlgorithmsAND-OR Search FormulationPolicy Gradient ApproachRelated Models
![Page 108: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/108.jpg)
Concurrent Probabilistic Temporal Planning
concurrency, timedurative actionstimed effectsconcurrency
and
uncertaintyabout the effectstheir timingthe action duration
![Page 109: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/109.jpg)
Actions in CPTP(:durative-action jump
:parameters (?p - person ?c - parachute):condition (and (at start (and (alive ?p)
(on ?p plane)(flying plane)(wearing ?p ?c)))
(over all (wearing ?p ?c)))
:effect (and (at start (not (on ?p plane)))(at end (on ?p ground))(at 5 (probabilistic
(0.8 (at 42 (standing ?p)))(0.2 (at 13 (probabilistic
(0.1 (at 14 (bruised ?p)))(0.9 (at 14 (not (alive ?p))))))))))) ?
&
e
?
&
e
e
e e
4
duration
simple effect
prob. effect
conj. effect
U(3,5)
25% 75%
N(4,1)
2
e
10%90%
![Page 110: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/110.jpg)
Actions in CPTP: The Simplest Case
TGP-style action:preconditions hold at start and over alleffects are only available at endduration is fixed or probabilistic
Additionally:effect-independent durationmonotonic continuation(normal, uniform, exp.)
?
& &
eeSoft Soil Found Need Blast Permit
eSoil Test Done
eSoil Test Done
&
25% 75%
N(10,2)
![Page 111: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/111.jpg)
Plans in CPTP
![Page 112: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/112.jpg)
Decision Points in CPTP
DefinitionsPivot: Time point at which an event might take place (effect,condition being needed).Happening: Time point at which an event actually takes place.
Completeness/Optimality Results [Mausam & Weld, AAAI-06]
1 With TGP actions, decision points may be restricted to pivots.2 With TGP actions and deterministic durations, decision points
may be restricted to happenings.3 Conjecture: idem with effect-independent durations and
monotonic continuations.4 In general, restriction to pivots may cause incompleteness.
![Page 113: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/113.jpg)
Plan for Part V
Concurrent Probabilistic Temporal Planning (CPTP)CoMDP ModelValue-Based AlgorithmsAND-OR Search FormulationPolicy Gradient ApproachRelated Models
![Page 114: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/114.jpg)
CoMDP in Interwoven Epoch State Space
Why Interwoven?
aligned epochs interwoven epochs
The traditional aligned CoMDP model is suboptimal for CPTP
![Page 115: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/115.jpg)
CoMDP in Interwoven Epoch State Space
CoMDP state contains:current world state wevent queue q, records advancement of executing actionsinspired from SAPA, TLPlan, HSP, etc
Event queue contains pairs:event e (simple effect, prob effect, condition check . . . )distribution for the duration remaining until e happens
? e?ee
00 14
c
3N(2,1) U(7,9)
Queue for TGP actions with fixed durations:q = {〈a, δ〉 | a is executing and will terminate in δ time units}
![Page 116: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/116.jpg)
CoMDP in Interwoven Epoch State Space
A(s) : as in standard CoMDP, but includes the empty set(wait). Need to check interference with executing actions inthe queue.
Pr : tedious to formalise (even for restricted cases), see[Mausam & Weld, JAIR-07]. Considers all possible states at allpivots between the min. time an event could happen andthe max. time one is guaranteed to happen. → motivatessampling!
? e?ee
00 14
c
3N(2,1) U(7,9)
C(s, A, s′) : time elapsed between s and s′.
![Page 117: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/117.jpg)
Plan for Part V
Concurrent Probabilistic Temporal Planning (CPTP)CoMDP ModelValue-Based AlgorithmsAND-OR Search FormulationPolicy Gradient ApproachRelated Models
![Page 118: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/118.jpg)
Value-Based Algorithms
DUR family of planners [Mausam & Weld ICAPS-05, JAIR-07]
assumptions (to start with):− TGP actions with fixed integer durations⇒ decision points are happenings⇒ event queue records remaining duration for each action
sampled RTDP appliesto cope with interwoven state space blow-up:
1 heuristics2 hybridisation
![Page 119: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/119.jpg)
Maximum Concurrency Heuristic
divide the optimal serial MDP cost bymax nb. actions executable concurrently in the domain
J∗-–(〈s, ∅〉) ≥ J∗(s)
m
J∗-–(〈s, q〉) ≥ Q∗(s, Aq)
m
a
b c
4
a b c
(4+1+2) = 7
7/2 < 4serialisation
![Page 120: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/120.jpg)
Eager Effects Heuristic
effects realised when the fastest started actions endstime advances accordinglyCoMDP state:〈 world state after effects, duration until last executing action ends 〉relaxed problem:
− get information about effects ahead of time− mutex action combinations are allowed (lost track of time)
a
b
c
s s’
8
2
4
(s’,6)
![Page 121: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/121.jpg)
HybridisationHybrid interwoven/aligned policy for probable/unprobable states
1 run RTDP interwoven for a number of trials→ yields lower bound L = J(s0)
2 run RTDP aligned on low frequency states3 clean up and evaluate hybrid policy π
→ yields upper bound u = Jπ(s0)
4 repeat until performance ratio r reached ( (U−L)L < r)
G
Gs
low prob.
![Page 122: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/122.jpg)
Extensions of the DUR Planner
∆DUR [Mausam & Weld, AAAI-06, JAIR-07] extends DUR to TGPactions with stochastic durations.
MC and hybrid: apply with minor variations.
∆DURexp, expected duration planner:effect-independent durations & monotonic continuationsassigns an action its (fixed) mean durationuse DUR to generate policy and execute:if action terminates early, extend policy from current stateif action is late to terminate, update mean, then extend.
∆DURarch, archetypal duration planner:extends ∆DURexp to multimodal distributionsprobabilistic outcomes with different mean durations
![Page 123: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/123.jpg)
Plan for Part V
Concurrent Probabilistic Temporal Planning (CPTP)CoMDP ModelValue-Based AlgorithmsAND-OR Search FormulationPolicy Gradient ApproachRelated Models
![Page 124: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/124.jpg)
AND/OR Search Formulation
Prottle [Little et. al AAAI-05]
forward search planner, solves CPTP over finite horizonnot extremely different from DUR:
finer characterisation of the search space for CPTPslightly different search algorithm (lower + upper bound)planning graph heuristics
current implementation:handles general CPTP actions with fixed durations on arcsincomplete: only considers pivotstakes cost to be the probability of failure
![Page 125: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/125.jpg)
Prottle’s Search Space
Interwoven epochs and-or graphand-or graph: and = chance, or = choicenode purposes: action selection or time advancementnode contains: current state, current time, event queue
choiceadvancement
selection
choice
selection
chanceadvancement
chance
![Page 126: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/126.jpg)
Prottle’s Search Space
a1
0
a2?
5
42
a2
0
0
5
o6o5
o2o1
o6o5
5 5
5
5
5
5
00
o2
13
o3o4
1313
1414
o1
0
![Page 127: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/127.jpg)
Prottle’s AlgorithmTrial based with lower and upper bound (BRTDP and FRTDPare similar). Selection strategy quickly gets a likely path to thegoal and robustifies known paths thereafter.
a2?
a2
0
0
5
o6o5
o2o1
o6o5
5 5
5
5
5
o2
13
o3o4
1313
1414
0
� � �
� � �
� � �
� � �
� � �
� � �
� � �
� � �
� � �
� � �
� � �
� � �
� � �
� � �
� � �
� � �
� � �
� � �
� � �
� � �0
a1
00
5
o1
5
42
[0.1, 1.0]
[0.0. 1.0]
[0.0, 0.0]
[0.0, 1.0]80% 20%
[0.0, 0.0]
[0.0, 0.0]
[0.1, 0.2]
[0.1, 0.2] [0.1, 0.2]
![Page 128: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/128.jpg)
Prottle’s Algorithm (details)
node lower/upper cost boundscost = probability of failurebounds initialised using heuristics
bound update rulesLchoice(n) := max(L(n), minn′∈ S(n) L(n′))Uchoice(n) := min(U(n), minn′∈ S(n) U(n′))Lchance(n) := max(L(n),
∑n′∈ S(n) Pr(n′) L(n′))
Uchance(n) := min(U(n),∑
n′∈ S(n) Pr(n′) U(n′))
cost converges when U(n) − L(n) ≤ ε
node labels: solved, failure (solved with cost 1), unsolvednode selection: minimises P(n)U(n), uses P(n)L(n) tobreak ties
![Page 129: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/129.jpg)
Prottle’s HeuristicBased on a probabilistic temporal planning graph
backward propagation rules
Co(n, i) :=Q
n′∈S(n) Cp,o(n′, i)Ca(n, i) :=
Pn′∈S(n) Pr(n′) Co(n′, i)
Cp(n, i) :=Q
n′∈S(n) Ca(n′, i)
p2
o1 p3
p2p1
a1
o2
p2 p3
p1
a1
a2
o3
p4
o4
p1
o1
o2
60% 40%20%80%
20%
80%
a2?
a2
0
0
5
o6o5
o2o1
o6o5
5 5
5
5
00
a1
0
![Page 130: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/130.jpg)
Plan for Part V
Concurrent Probabilistic Temporal Planning (CPTP)CoMDP ModelValue-Based AlgorithmsAND-OR Search FormulationPolicy Gradient ApproachRelated Models
![Page 131: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/131.jpg)
Policy Gradient ApproachMinimises the expected cost of a factored parameterised policyby factored gradient descent in the parameters space.
world / simulator
observations o
cost c
not eligiblechoice disabled
Pr(a1=no | o1,w1) = 0.9
Pr(a3=yes | o2,w2) = 0.5
Pr(a3=no | o2,w2) = 0.5
a1
a3
a2
Pr(a2=no | o2,w2) = 1.0
Pr(a1=yes | o1,w1) = 0.1
action set APr(s’|A,s)
world stateeligible actions
![Page 132: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/132.jpg)
Factored Policy Gradient for CPTPFPG handles continuous time dist. [Aberdeen & Buffet ICAPS-07].
1 simulator manages an event queue2 cost function takes durations into account
world / simulator
observations o
not eligiblechoice disabled
Pr(a1=no | o1,w1) = 0.9
Pr(a3=yes | o2,w2) = 0.5
Pr(a3=no | o2,w2) = 0.5
a1
a3
a2
Pr(a2=no | o2,w2) = 1.0
Pr(a1=yes | o1,w1) = 0.1
world stateeligible actionsevent queue
action set APr(s’|A,s)
cost c includes time
event queue
![Page 133: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/133.jpg)
Plan for Part V
Concurrent Probabilistic Temporal Planning (CPTP)CoMDP ModelValue-Based AlgorithmsAND-OR Search FormulationPolicy Gradient ApproachRelated Models
![Page 134: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/134.jpg)
Related Models
Generalised Semi-MDP (GSMDP)[Younes & Simmons, AAAI-04]
set of states Sset of events E ; each event e is associated with:
Φe(s): enabling conditionGe(t): probability that e remains enabled before it triggersPr(s′ | e, s) transition probability when e triggers in s
actions A ⊆ E are controllable eventsrewards:
lump sum reward k(s, e, s′) for transitionscontinuous reward rate c(a, s) for a ∈ A being enabled in sdisc. inf. horz. model; reward at time t counts as e−αt
policy: maps timed histories to set of enabled actions
![Page 135: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/135.jpg)
Generalised Semi-Markov Decision Process
Parallel (asynchronous) composition of SMDPs is a GSMDP:distribution of an enabled event may depend on history.
MDP
SMDP
GSMDPgeneral delaysprobabilistic effectsconcurrency
general delaysprobabilistic effects
memoryless delaysprobabilistic effects
not office office
not wet wet
U(0,6)
U(0,6)
W(2)
make−wet
move
W(2)
not office office
U(0,6)
U(0,6)
not wet not wet
not office office
U(0,6)
U(0,6)
wet wet
W(2)
SMDPS
GSMDP
![Page 136: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/136.jpg)
Generalised Semi-Markov Decision Process
Specificities:synchronous systemsdiscrete/continuous time
Solution methods:approximate distributions with phase-type distributions andsolve the resulting MDP [younes & simmons AAAI-04]
→ to know more: attend Hakan’s Dissertation Award talk!incremental generate - test (statistical sampling) - debug[younes & simmons ICAPS-04]
→ covered by David
![Page 137: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/137.jpg)
References
Policy Generation for Continuous Time Domains with Concurrency, H.Younes and R. Simmons. ICAPS-04.
Solving Generalized Semi-Markov Processes using ContinuousPhase-Type Distributions, H. Younes and R. Simmons. AAAI-04.
Prottle: A Probabilistic Temporal Planner, I. Little, D. Aberdeen, and S.Thiebaux. AAAI-05.
Concurrent Probabilistic Temporal Planning, Mausam and D. Weld.ICAPS-05.
Probabilistic Temporal Planning with Uncertain Durations. Mausam andD. Weld. AAAI-06
Concurrent Probabilistic Temporal Planning with Policy-Gradients, D.Aberdeen and O. Buffet. ICAPS-07.
Planning with Durative Actions in Uncertain Domains, Mausam and D.Weld. JAIR, to appear, 2007.
![Page 138: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/138.jpg)
Probabilistic Temporal Planning
PART 6: Practical Considerations
![Page 139: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/139.jpg)
Incremental approachesWhen is contingency planning really needed ?Combining contingency planning & replanningApplications
Outline
![Page 140: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/140.jpg)
Problem Dimensions
What actionnext?
Percepts Actions
Environment
Static vs. Dynamic
Full vs. Partial satisfaction
Fully vs.
Partially Observable
Perfectvs.
Noisy
Deterministic vs.
Stochastic
Instantaneous vs.
Durative
Sequentialvs.
Concurrent
Discrete vs.
ContinuousOutcomes
Predictable vs. Unpredictable
![Page 141: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/141.jpg)
Problem Dimensions
What actionnext?
Percepts Actions
Environment
Static vs. Dynamic
Full vs. Partial satisfaction
Fully vs.
Partially Observable
Perfectvs.
Noisy
Deterministic vs.
Stochastic
Instantaneous vs.
Durative
Sequentialvs.
Concurrent
Discrete vs.
ContinuousOutcomes
Predictable vs. Unpredictable
![Page 142: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/142.jpg)
Can We Make it Discrete?
O1: left of nominal
O2: nominal
O3: right of nominal
.2
.6
.2
Drive (30, 52)
![Page 143: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/143.jpg)
What does “nominal” mean?
Drive (30, 52)
Collect
![Page 144: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/144.jpg)
What does “nominal” mean?
Drive (30, 52)
Picture
![Page 145: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/145.jpg)
Depends on Objective
O1: left of nominal
O2: nominal
O3: right of nominal
.2
.6
.2
Drive (30, 52)
![Page 146: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/146.jpg)
Incremental approachesJICICPTempastic
When is contingency planning needed ?Combining contingency planning & replanningApplications
Outline
![Page 147: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/147.jpg)
Incremental Approaches
Deterministic planner
deterministic relaxation
Stochastic simulation
Identify weakness
plan
Solve/Merge
![Page 148: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/148.jpg)
Differences
Deterministic planner
deterministic relaxation
Stochastic simulation
Identify weakness
plan
Solve/Merge
JICICPTempasticOpportunistic (Long/Fox)TCP (Foss/Onder)
Diffe
renc
es, D
iffer
ence
s, D
iffer
ence
s, D
iffer
ence
s
![Page 149: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/149.jpg)
Incremental approachesJICICPTempastic
When is contingency planning needed ?Combining contingency planning & replanningApplications
Outline
![Page 150: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/150.jpg)
Just in Case (JIC) Scheduling
Obs44
Obs17Obs2Obs23 Obs9
Observation Schedulingmany observationspriority 1–5
time windowstochastic duration
[1:30, 2:20]
:40
sky conditionstime constraints
Ref: Drummond, Bresina, & Swanson, AAAI-94
![Page 151: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/151.jpg)
1. Seed schedule2. Identify most likely failure3. Generate a contingency branch4. Incorporate the branch
Advantages: TractabilitySimple schedulesAnytime
.1 .4 .2
The JIC Algorithm
Ref: Drummond, Bresina, & Swanson, AAAI-94
![Page 152: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/152.jpg)
Dig(60)Visual servo (.2, -.15)
Lo res Rock finder LIB
µ = 120sσ = 60s
µ = 300sσ = 5s
µ = 1000sσ = 500s
t ∈ [9:00, 16:00]µ = 5sσ = 1s
µ = 120sσ = 20s V = 50
HiRes
V = 10
t ∈ [10:00, 13:50]µ = 600sσ = 60s
t ∈ [9:00, 14:30]µ = 5sσ = 1s
V = 5
Warmup LIB
µ = 1200sσ = 20s
Most probable failure points maynot be the best branch-points:
It is often too late to attempt othergoals when the plan is about tofail.
Μ : most probable failures$ : most useful branch point
ExpectedUtility
PowerStart time
1015
20
5
13:20
14:4014:20
14:0013:40
Μ ΜDrive (-2) NIR
ΜV = 100
t ∈ [10:00, 14:00]µ = 600sσ = 60s
$
Limits of JIC Heuristic
![Page 153: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/153.jpg)
Incremental approachesJICICPTempastic
When is contingency planning needed ?Combining contingency planning & replanningApplications
Outline
![Page 154: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/154.jpg)
1. Seed plan
2. Identify best branch point
3. Generate a contingency branch
4. Evaluate & integrate the branch
? ?? ?
rVb
Vm
Construct plangraph
Back-propagate value tables
Compute gain
Incremental Contingency Planning
![Page 155: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/155.jpg)
g1
g2
g3
g4
V1
V2
V3
V4
r
r
r
r
v
r
v
r
v
r
Back-Propagate Value Tables
![Page 156: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/156.jpg)
p
r5 15
.1
V
p
r5 10
.2
v
r
v
r5 15
v
r10 25
V(r’) = ∫ Pc(r) V(r’-r) dr∞
0
Simple Back-Propagation
![Page 157: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/157.jpg)
p
r5 15
.1
V
p
r5 10
.2 v
r5 15
v
r10 25
p q
ts
v
r5 15
v
r
{t}
p
r5
{q}v
r10 20
{q}
{t}
Conjunctions
![Page 158: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/158.jpg)
V1
V2
V3
V4
V
r
V
r
V
r
V
r
Max
Estimating Branch Value
![Page 159: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/159.jpg)
r
V1
V2
V3
V4
Vb
r
P
r
Gain = ∫ P(r) max{0,Vb(r) - Vm(r)} dr∞
0
Vm
Vb
Expected Branch Gain
branch condition
![Page 160: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/160.jpg)
1. Seed plan
2. Identify best branch point
3. Generate a contingency branch
4. Evaluate & integrate the branch
? ?? ?
rVb
Vm
Construct plangraph
Back-propagate value tables
Compute gain
Identifying the Best Branch Point
![Page 161: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/161.jpg)
1. Seed plan
2. Identify best branch point
3. Generate a contingency branch
4. Evaluate & integrate the branch
? ?? ?
rVb
Vm
Generating the Branch
Plan for the branch/condition
![Page 162: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/162.jpg)
1. Seed plan
2. Identify best branch point
3. Generate a contingency branch
4. Evaluate & integrate the branch
? ?? ?
rVb
Vm Compute value function
Compute actual gain
Evaluating the Branch
![Page 163: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/163.jpg)
Incremental approachesJICICPTempastic
When is contingency planning needed ?Combining contingency planning & replanningApplications
Outline
![Page 164: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/164.jpg)
Generate initial policy
Test if policy is good
Debug and repair policy
good
badrepeat
Tempastic
Ref: Younes & Simmons, ICAPS-04
![Page 165: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/165.jpg)
Generate initial policy
Test if policy is good
Debug and repair policy
good
badrepeat
Tempastic Details
Solve deterministic problemUse as training data
to generate policy
Stochastic simulation
Rank bugsAdapt deterministicproblemSolve deterministic problemUse as training data
to improve policy
![Page 166: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/166.jpg)
Policy Generation
Split discrete outcomesRelax continuous outcomes
Solve using VHPOP
Generate training databy simulating plan
Decision tree learning
Probabilistic planning problem
Policy (decision tree)
Deterministic planning problem
Temporal plan
State-action pairs
O1
O2
.4
.6A
O1A
O2A
s0: A4
s1: A7
s2: A1
s3: A5…
A1A7A4 A5
p7
p3
p18
p2
p9
p9
A7 A4
A5A2
A13A9
A2
![Page 167: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/167.jpg)
Policy Tree
atpgh-taxi,cmu
atme,cmu
atmpls-taxi,mpls-airport
atplane,mpls-airport
atme,pgh-airport
inme,plane
movingpgh-taxi,cmu,pgh-airport
movingmpls-taxi,mpls-airport,honeywellatme,mpls-airport
enter-taxi depart-taxi
leave-taxi
check-in
enter-taxi depart-taxi leave-taxi
idle
idleidle
![Page 168: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/168.jpg)
Generate initial policy
Test if policy is good
Debug and repair policy
good
badrepeat
Tempastic Details
Solve deterministic problemUse as training data
to generate policy
Stochastic simulation
Rank bugsAdapt deterministicproblemSolve deterministic problemUse as training data
to improve policy
![Page 169: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/169.jpg)
Policy DebuggingSample execution paths
Revised policy
Sample path analysis
Solve deterministic planning problemtaking failure scenario into account
Failure scenarios
Temporal plan
State-action pairs
Generate training databy simulating plan
Incremental decision tree learning
![Page 170: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/170.jpg)
Policy Debugging DetailsSample execution paths
Revised policy
Sample path analysis
Solve deterministic planning problemtaking failure scenario into account
Failure scenarios
Temporal plan
State-action pairs
Generate training databy simulating plan
Incremental decision tree learning
s0 s1
s3 s4
s2
1/3
2/3 1/2
1/21
Construct Markov chain:
Bellman backups
Incorporate most important failure& force planner to work around it
![Page 171: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/171.jpg)
Sample Path Analysis: Example
s0 s1 s2e1 e2
s0 s1 s4e1 e4
s0 s3e3
s2e2
Sample paths:
γ = 0.9
s0 s1
s3 s4
s2
1/3
2/3 1/2
1/21
Markov chain:
V(s0) = –0.213
V(s1) = –0.855
V(s2) = –1
V(s3) = +1
V(s4) = –0.9
State values:
V(e1) = 2·(V(s1) – V(s0)) = –1.284
V(e2) = (V(s2) – V(s1)) + (V(s2) – V(s4)) = –0.245
V(e3) = V(s3) – V(s0) = +1.213
V(e4) = V(s4) – V(s1) = –0.045
Event values:
![Page 172: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/172.jpg)
Revised Policy Tree
atpgh-taxi,cmu
atme,cmu
…
enter-taxi depart-taxi
has-reservationme,plane has-reservationme,plane
make-reservation leave-taxi
![Page 173: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/173.jpg)
Generate initial policy
Test if policy is good
Debug and repair policy
good
badrepeat
Tempastic Summary
Solve deterministic problemUse as training data
to generate policy
Stochastic simulation
Rank bugsAdapt deterministicproblemSolve deterministic problemUse as training data
to improve policy
![Page 174: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/174.jpg)
Sacrifice optimalityseed planrepairs
ThrashingFlaw Selection
particularly for oversubscription
Advantages & Drawbacks
TractabilityAnytimeSimple plans
Advantages
Drawbacks
![Page 175: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/175.jpg)
Incremental approachesWhen is contingency planning really needed ?Combining contingency planning & replanningApplications
Outline
![Page 176: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/176.jpg)
Replanning
Improving robustnessConservatismFlexibilityConformanceConditionality
Alternative Approaches
![Page 177: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/177.jpg)
Replanning
Improving robustnessConservatismFlexibilityConformanceConditionality
Alternative Approaches
Not mutually exclusive
Which one when?
![Page 178: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/178.jpg)
Requirements Drawbacks
ComputationalModel outcomes# of outcomes small
Contingency
WeakComputational
Limited uncertaintyPowerful actions
Conformant
WeakComputational
Limited uncertaintySophisticated
rep., planner, exec
Flexibility
Lost opportunityResource usageConservatism
Improving robustness
Lost opportunityNon-optimalFailure
Adequate time,computational power
Time not critical resourceNo dead ends
ReplanningApproach
Requirements & Drawbacks
![Page 179: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/179.jpg)
When ISS Examples
Only Critical situations
Simple forcing actions
Duration uncertaintyEvent time uncertainty
Critical resource
Minor annoyancesreversible outcomeslow penalty
Rich opportunitiesHighly stochastic
Power inverter failurePressure leakFire
Contingency
Computer resetConformant
Daily tasksCommunication
Flexibility
O2, H2O, food, powerConservatism
Improving robustness
Misplaced suppliesLoading, storageJob jarObstacle avoidance
ReplanningApproach
When?
![Page 180: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/180.jpg)
Considered within larger contextreplanning
Different emphasisunrecoverable outcomes
(not just high probability/low value outcomes)
\start{soapbox}
Point?
![Page 181: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/181.jpg)
Considered within larger contextreplanning
Different emphasisunrecoverable outcomes
(not just high probability/low value outcomes)
1. Don’t care about having a complete policy2. Policy must cover critical outcomes
\end{soapbox}
Impacts for Policy Search
![Page 182: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/182.jpg)
Incremental approachesWhen is contingency planning really needed ?Combining contingency planning & replanningApplications
Outline
![Page 183: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/183.jpg)
Precautionary Planning
Generate high probabilitydeterministic seed plan
Identify & repairunrecoverable outcomes
Replan from current state
Execute next step
repa
irim
poss
ible
successful
unexpectedoutcome
succ
essf
ul
Ref: Foss, Onder & Smith, ICAPS-07 Wkshp
![Page 184: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/184.jpg)
Seed Plan Generation
Generate high probabilitydeterministic seed plan
Identify & repairunrecoverable outcomes
Replan from current state
Execute next step
repa
irim
poss
ible
successful
unexpectedoutcome
succ
essf
ul
O1
O2
.4
.6A
O1A
O2A
-log(.6)
-log(.4)
Split discrete outcomesExpectationsAssign costsInvoke LPG-TD
![Page 185: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/185.jpg)
Unrecoverable Outcomes
Generate high probabilitydeterministic seed plan
Identify & repairunrecoverable outcomes
Replan from current state
Execute next step
repa
irim
poss
ible
successful
unexpectedoutcome
succ
essf
ul
O1
O2
.4
.6A
O1A
O2A
-log(.6)
-log(.4)
Split discrete outcomesAssign costsInvoke LPG-TD
?Evaluate goal
reachability in PG
A
G’
?
A’R
Regress conditions
G’Forcing goal
Create new action
Invoke LPG-TD
1
2
3
![Page 186: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/186.jpg)
Execution
Generate high probabilitydeterministic seed plan
Identify & repairunrecoverable outcomes
Replan from current state
Execute next step
repa
irim
poss
ible
successful
unexpectedoutcome
succ
essf
ul
O1
O2
.4
.6A
O1A
O2A
-log(.6)
-log(.4)
Split discrete outcomesAssign costsInvoke LPG-TD
? Evaluate goal reachability in PG
A
G’?
A’R
Regress conditions
G’Forcing goal
Create new action
Invoke LPG-TD
1
2
3
limited horizon
![Page 187: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/187.jpg)
Unplanned Outcomes
Generate high probabilitydeterministic seed plan
Identify & repairunrecoverable outcomes
Replan from current state
Execute next step
repa
irim
poss
ible
successful
unexpectedoutcome
succ
essf
ul
O1
O2
.4
.6A
O1A
O2A
-log(.6)
-log(.4)
Split discrete outcomesAssign costsInvoke LPG-TD
? Evaluate goal reachability in PG
A
G’?
A’R
Regress conditions
G’Forcing goal
Create new action
Invoke LPG-TD
1
2
3
Split discrete outcomesAssign costsInvoke LPG-TD
![Page 188: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/188.jpg)
Main Points
Generate high probabilitydeterministic seed plan
Identify & repairunrecoverable outcomes
Replan from current state
Execute next step
repa
irim
poss
ible
successful
unexpectedoutcome
succ
essf
ul
Ref: Foss, Onder & Smith, ICAPS-07 Wkshp
ICP combined with replanning
Deterministic planner
Repair unrecoverable outcomes
![Page 189: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/189.jpg)
Incremental approachesWhen is contingency planning really needed ?Combining contingency planning & replanningApplications
Military air campaign planning [Meuleau et al AAAI-98]
Military operations planning [Aberdeen et al ICAPS-04]
Rover planning [Pedersen et al IEEEaero-05][Meuleau et al AAAI-04 Wkshp]
Outline
![Page 190: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/190.jpg)
Customer: DARPAProblem:
military targets with time windowslimited number of weapons (bombs) & aircraftstrike outcomes uncertain, but observableobjective – allocate aircraft & bombs to targets at each time step
ApproachMarkov Task Decomposition (MTD)
offline: solve parameterized MDPs for each targetat each time step, allocate weapons across targets
ResultsSynthetic problems: 1000 targets, 10,000 weapons, 100 planes35 minutesquality close to DP
Military Air Campaign Planning[Meuleau et al AAAI-98]
Concurrency (1000)Unit time actionsDiscrete outcomes
![Page 191: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/191.jpg)
Customer: Australian Defence Science & Technology OrganisationProblem:
set of military objectives (propositions)tasks (durative actions) make propositions true/falseobjective - achieve goals
minimize failure, makespan, resource costApproach
LRTDPadmissible heuristics – probability, makespan, resource usagepruning of states not recently visited (LRU)
Resultssynthetic problems (85) & military scenarios (2)biggest: 41 tasks, 51 facts, 19 resource types
10 minutes
Military Operations Planning[Aberdeen et al ICAPS-04]
Concurrency (8)Durative actionsDiscrete outcomes
![Page 192: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/192.jpg)
Customer: NASAProblem:
set of science goals w/utilities, time constraintstime & energy limitationsduration & resource usage uncertain (driving)objective - maximize scientific reward
ApproachICP w/EUROPA plannerheuristics
branch selection – utility dropgoal selection – orienteering
Resultssimulator problems w/upto 20 objectivesK9 rover - small problems (5 objectives)
Rover Planning[Pedersen et al IEEEaero-05]
Durative actionsContinuous outcomesOversubscriptionMinor concurrency
![Page 193: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/193.jpg)
Contingency Planner
EUROPA
Mo
nte C
arlo S
imu
l at or
Constraints Constraints
Plan fragment
Evaluation
P
rr
V
β–planner
Constraint Engine
Branch selectionCondition selectionGoal selection
Planner Architecture
![Page 194: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/194.jpg)
Contingency Plan
![Page 195: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/195.jpg)
Customer: NASAProblem:
set of science goals w/utilitiesobjective - maximize scientific reward
ApproachPlangraph constructionDP regression of utility tables through PG
Resultssynthetic problems w/upto 5 objectives, 75 paths40s
Rover Planning[Meuleau et al AAAI-04 Wkshp]
OversubscriptionConcurrency
g1
g2
g3
g4
V1
V2
V3
V4
r
r
r
r
vr
vr
vr
![Page 196: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/196.jpg)
Incremental approachesWhen is contingency planning really needed ?Combining contingency planning & replanningApplications
Outline
The End.
![Page 197: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/197.jpg)
References – Incremental Approaches
Dearden, R.; Meuleau, N.; Ramakrishnan, S.; Smith, D.; and Washington, R.Incremental contingency planning. ICAPS-03 Wkshp on Planning under Uncertaintyand Incomplete Information.
Drummond, M.; Bresina, J.; and Swanson, K. Just In-Case scheduling. AAAI-94.
Foss, J., and Onder, N. A hill-climbing approach to planning with temporaluncertainty. FLAIRS-06.Foss, J.; Onder, N.; and Smith, D. Preventing unrecoverable failures throughprecautionary planning. ICAPS-07 Wkshp on Moving Planning and SchedulingSystems into the Real World.Long, D., and Fox, M. Singe-trajectory opportunistic planning under uncertainty. 2002UK Planning and Scheduling SIG.Younes, H., and Simmons, R. Policy generation for continuous-time stochasticdomains with concurrency. ICAPS-04.
![Page 198: Probabilistic Temporal Planningmausam/papers/tut07.pdfProbabilistic Temporal Planning PART II: Introduction to Probabilistic Planning Algorithms Mausam David E. Smith Sylvie Thiébaux](https://reader034.fdocuments.net/reader034/viewer/2022042113/5e8eb2f7597ef6335a03b0e5/html5/thumbnails/198.jpg)
References – Applications
Aberdeen, D.; Thiébaux, S.; and Zhang, L. Decision theoretic military operationsplanning. ICAPS-04.
Meuleau, N.; Dearden, R.; and Washington, R. Scaling up decision theoretic planningto planetary rover problems. AAAI-04 Workshop on Learning and Planning in MarkovProcesses: Advances and Challenges.Meuleau, N.; Hauskrecht, M.; Kim, K.; Peshkin, L.; Kaelbling, L.; Dean, T.; andBoutilier, C. 1998. Solving very large weakly coupled Markov Decision Processes.AAAI-98.
Pedersen, L.; D.Smith; Dean, M.; Sargent, R.; Kunz, C.; Lees, D.; and Rajagopalan, S.Mission planning and target tracking for autonomous instrument placement. 2005IEEE Aerospace Conf.