Planning Under Uncertainty
-
Upload
melisande-faure -
Category
Documents
-
view
27 -
download
0
description
Transcript of Planning Under Uncertainty
![Page 1: Planning Under Uncertainty](https://reader035.fdocuments.net/reader035/viewer/2022062520/568132b1550346895d99673d/html5/thumbnails/1.jpg)
Planning Under Uncertainty
![Page 2: Planning Under Uncertainty](https://reader035.fdocuments.net/reader035/viewer/2022062520/568132b1550346895d99673d/html5/thumbnails/2.jpg)
Sensing error
Partial observability
Unpredictable dynamics
Other agents
![Page 3: Planning Under Uncertainty](https://reader035.fdocuments.net/reader035/viewer/2022062520/568132b1550346895d99673d/html5/thumbnails/3.jpg)
Three Strategies for Handling Uncertainty Reactive strategies
Replanning Proactive strategies
Anticipating future uncertaintyShaping future uncertainty (active
sensing)
![Page 4: Planning Under Uncertainty](https://reader035.fdocuments.net/reader035/viewer/2022062520/568132b1550346895d99673d/html5/thumbnails/4.jpg)
Uncertainty in Dynamics
This class: the robot has uncertain dynamicsCalibration errors, wheel slip, actuator error,
unpredictable obstacles Perfect (or good enough) sensing
![Page 5: Planning Under Uncertainty](https://reader035.fdocuments.net/reader035/viewer/2022062520/568132b1550346895d99673d/html5/thumbnails/5.jpg)
Modeling imperfect dynamics
Probabilistic dynamics modelP(x’|x,u): a probability distribution over
successors x’, given state x, control uMarkov assumption
Nondeterministic dynamics model f(x,u) -> a set of possible successors
![Page 6: Planning Under Uncertainty](https://reader035.fdocuments.net/reader035/viewer/2022062520/568132b1550346895d99673d/html5/thumbnails/6.jpg)
Target Tracking
targetrobot
The robot must keep a target in its field of view
The robot has a prior map of the obstacles
But it does not know the target’s trajectory in advance
![Page 7: Planning Under Uncertainty](https://reader035.fdocuments.net/reader035/viewer/2022062520/568132b1550346895d99673d/html5/thumbnails/7.jpg)
Target-Tracking Example Time is discretized into
small steps of unit duration
At each time step, each of the two agents moves by at most one increment along a single axis
The two moves are simultaneous
The robot senses the new position of the target at each step
The target is not influenced by the robot (non-adversarial, non-cooperative target)
targetrobot
![Page 8: Planning Under Uncertainty](https://reader035.fdocuments.net/reader035/viewer/2022062520/568132b1550346895d99673d/html5/thumbnails/8.jpg)
Time-Stamped States (no cycles possible)
State = (robot-position, target-position, time) In each state, the robot can execute 5 possible actions :
{stop, up, down, right, left} Each action has 5 possible outcomes (one for each possible action
of the target), with some probability distribution[Potential collisions are ignored for simplifying the presentation]
([i,j], [u,v], t)
• ([i+1,j], [u,v], t+1)• ([i+1,j], [u-1,v], t+1)• ([i+1,j], [u+1,v], t+1)• ([i+1,j], [u,v-1], t+1)• ([i+1,j], [u,v+1], t+1)
right
![Page 9: Planning Under Uncertainty](https://reader035.fdocuments.net/reader035/viewer/2022062520/568132b1550346895d99673d/html5/thumbnails/9.jpg)
Rewards and CostsThe robot must keep seeing the target as long as possible
Each state where it does not see the target is terminal
The reward collected in every non-terminal state is 1; it is 0 in each terminal state
[ The sum of the rewards collected in an execution run is exactly the amount of time the robot sees the target]
No cost for moving vs. not moving
![Page 10: Planning Under Uncertainty](https://reader035.fdocuments.net/reader035/viewer/2022062520/568132b1550346895d99673d/html5/thumbnails/10.jpg)
Expanding the state/action tree
...
horizon 1 horizon h
![Page 11: Planning Under Uncertainty](https://reader035.fdocuments.net/reader035/viewer/2022062520/568132b1550346895d99673d/html5/thumbnails/11.jpg)
Assigning rewards
...
horizon 1 horizon h
Terminal states: states where the target is not visible
Rewards: 1 in non-terminal states; 0 in others
But how to estimate the utility of a leaf at horizon h?
![Page 12: Planning Under Uncertainty](https://reader035.fdocuments.net/reader035/viewer/2022062520/568132b1550346895d99673d/html5/thumbnails/12.jpg)
Estimating the utility of a leaf
...
horizon 1 horizon h
Compute the shortest distance d for the target to escape the robot’s current field of view
If the maximal velocity v of the target is known, estimate the utility of the state to d/v [conservative estimate]
targetrobot
d
![Page 13: Planning Under Uncertainty](https://reader035.fdocuments.net/reader035/viewer/2022062520/568132b1550346895d99673d/html5/thumbnails/13.jpg)
Selecting the next action
...
horizon 1 horizon h
Compute the optimal policy over the state/action tree using estimated utilities at leaf nodes
Execute only the first step of this policy
Repeat everything again at t+1… (sliding horizon)
Real-time constraint: h is chosen so that a decision can be returned in unit time [A larger h may result in a better decision that will arrive too late !!]
![Page 14: Planning Under Uncertainty](https://reader035.fdocuments.net/reader035/viewer/2022062520/568132b1550346895d99673d/html5/thumbnails/14.jpg)
Pure Visual Servoing
![Page 15: Planning Under Uncertainty](https://reader035.fdocuments.net/reader035/viewer/2022062520/568132b1550346895d99673d/html5/thumbnails/15.jpg)
Computing and Using a Policy
![Page 16: Planning Under Uncertainty](https://reader035.fdocuments.net/reader035/viewer/2022062520/568132b1550346895d99673d/html5/thumbnails/16.jpg)
Markov Decision Process Finite state space X, action space U Reward function R(x) Optimal value function V(x)
Bellman equations If x is terminal:
V(x) = R(x)
If x is non-terminal:
V(x) = R(x) + maxuUx’XP(x’|x,u)V(x’)
*(x) = arg maxuUx’XP(x’|x,u)V(x’)
![Page 17: Planning Under Uncertainty](https://reader035.fdocuments.net/reader035/viewer/2022062520/568132b1550346895d99673d/html5/thumbnails/17.jpg)
Solving Finite MDPs
Value iteration Improve V(x) by repeatedly “backing-up”
value of best action Policy iteration
Improve (x) by repeatedly computing best action
Linear programming
![Page 18: Planning Under Uncertainty](https://reader035.fdocuments.net/reader035/viewer/2022062520/568132b1550346895d99673d/html5/thumbnails/18.jpg)
Continuous Markov Decision Processes Discretize state and action space
E.g., grids, random samples=> Perform value/policy iteration/LP
formulation on discrete space Or discretize value function
Basis functions=> Iterative least squares approaches
![Page 19: Planning Under Uncertainty](https://reader035.fdocuments.net/reader035/viewer/2022062520/568132b1550346895d99673d/html5/thumbnails/19.jpg)
Extra Credit for Open House +5% of final project grade Preliminary demo & description by next Tuesday Open House on April 16, 4-7pm
![Page 20: Planning Under Uncertainty](https://reader035.fdocuments.net/reader035/viewer/2022062520/568132b1550346895d99673d/html5/thumbnails/20.jpg)
Upcoming Subjects
Sensorless planning (Goldberg, 1994)
An example of a nondeterministic uncertainty model
Planning to sense (Gonzalez-Banos and Latombe 2002)
![Page 21: Planning Under Uncertainty](https://reader035.fdocuments.net/reader035/viewer/2022062520/568132b1550346895d99673d/html5/thumbnails/21.jpg)
Comments on Uncertainty
Reasoning with uncertainty requires representing and transforming sets of states
This is challenging in high-D spaces Hard to get accurate, sparse representation Multi-modal, nonlinear, degenerate distributions
Breakthroughs would have dramatic impact