Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter...
-
Upload
sibyl-small -
Category
Documents
-
view
214 -
download
0
Transcript of Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter...
![Page 1: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/1.jpg)
Deciding Under Deciding Under Probabilistic Probabilistic UncertaintyUncertainty
Russell and Norvig: Sect. 17.1-3,Chap. 17Russell and Norvig: Sect. 17.1-3,Chap. 17CS121 – Winter 2003
![Page 2: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/2.jpg)
Non-deterministic vs. Non-deterministic vs. Probabilistic UncertaintyProbabilistic Uncertainty
Non-deterministic model
?
ba c
{a,b,c}
decision that is best for worst case
~ Adversarial search
?
ba c
{a(pa),b(pb),c(pc)}
decision that maximizes expected utility valueProbabilistic
model
![Page 3: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/3.jpg)
s0
s3s2s1
A1
0.2 0.7 0.1100 50 70 utility of state
U(S0) = 100 x 0.2 + 50 x 0.7 + 70 x 0.1 = 20 + 35 + 7 = 62
One State/One Action One State/One Action ExampleExample
![Page 4: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/4.jpg)
s0
s3s2s1
A1
0.2 0.7 0.1100 50 70
A2
s40.2 0.8
80
• U1(S0) = 62• U2(S0) = 74• U(S0) = max{U1(S0),U2(S0)} = 74
One State/Two Actions One State/Two Actions ExampleExample
![Page 5: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/5.jpg)
s0
s3s2s1
A1
0.2 0.7 0.1100 50 70
A2
s40.2 0.8
80
• U1(S0) = 62 – 5 = 57• U2(S0) = 74 – 25 = 49• U(S0) = max{U1(S0),U2(S0)} = 57
-5 -25
Introducing Action CostsIntroducing Action Costs
![Page 6: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/6.jpg)
Example: Finding JulietExample: Finding Juliet
Charles’ off.
Juliet’s off.
Conf. room
10min
5min
10min
![Page 7: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/7.jpg)
Example: Finding JulietExample: Finding Juliet
States are: S0: Romeo in Charles’ office S1: Romeo in Juliet’s office and Juliet here S2: Romeo in Juliet’s office and Juliet not here S3: Romeo in conference room and Juliet here S4: Romeo in conference room and Juliet not here
Actions are: GJO (go to Juliet’s office) GCR (go to conference room)
![Page 8: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/8.jpg)
Utility ComputationUtility Computation
0
GJO
GJO
GCR
GCR
1
1
2
3
4
3
0
0
0
0
-10
-10
-10
-15
-10
Charles’ off.
Juliet’s off.
Conf. room
10min
5min
10min
![Page 9: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/9.jpg)
n-Step Decision Processn-Step Decision Process
There is a single initial stateStates reached after i steps are all different from those reached after ji stepsEach state i has a its reward R(i) Each state reached after n steps is terminalThe goal is to maximize the sum of rewards
0 1 2 3
![Page 10: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/10.jpg)
n-Step Decision Processn-Step Decision Process
Utility of state i:
U(i) = R(i) + maxa kP(k | a.i) U(k)
0 1 2 3
i
• Two possible actions from i : a1 and a2
• a1 leads to k11 with probability P11 or k12 with probability P12
P11 = P(k11 | a1.i) P12 = P(k12 | a1.i)• a2 leads to k21 with probability P21 or k22 with probability P22
• U(i) = R(i) + max {P11 U(k11) + P12 U(k12), P21 U(k21) + P22 U(k22)}
![Page 11: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/11.jpg)
n-Step Decision Processn-Step Decision Process
For j = n-1, n-2, …, 0 do:
For every state si attained after step j Compute the utility of si
Label that state with the corresponding best action
Utility of state i:
U(i) = R(i) + maxa kP(k | a.i) U(k)
Best choice of action at state i:
*(i) = arg maxa kP(k | a.i) U(k)0 1 2 3
i
Optimal policy
![Page 12: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/12.jpg)
n-Step Decision Process …n-Step Decision Process …
Utility of state i:
U(i) = maxa (kP(k | a.i) U(k) – Ca)
Best choice of action at state i:
*(i) = arg maxa (kP(k|a.i)U(k)–Ca)
0 1 2 3
i
… with costs on actions instead ofrewards on states
![Page 13: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/13.jpg)
0
GJO
GJO
GCR
GCR
1
1
2
3
4
3
0 1 2 3
i
![Page 14: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/14.jpg)
0
GJO
GJO
GCR
GCR
1
2
4
3
0 1 2 3
i
![Page 15: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/15.jpg)
Target TrackingTarget Tracking
targetrobot
• The robot must keep the target in view• The target’s trajectory is not known in advance• The environment may or may not be known
![Page 16: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/16.jpg)
Target TrackingTarget Tracking
targetrobot
• The robot must keep the target in view• The target’s trajectory is not known in advance• The environment may or may not be known
![Page 17: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/17.jpg)
States Are Indexed by States Are Indexed by Time Time
• State = (robot-position, target-position, time)• Action = (stop, up, down, right, left)• Outcome of an action = 5 possible states, each with probability 0.2
([i,j], [u,v], t)
• ([i+1,j], [u,v], t+1)• ([i+1,j], [u-1,v], t+1)• ([i+1,j], [u+1,v], t+1)• ([i+1,j], [u,v-1], t+1)• ([i+1,j], [u,v+1], t+1)
right
Each state has 25 successors
![Page 18: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/18.jpg)
h-Step Planning Processh-Step Planning Process
Planning horizon h
“Terminal” states:• States where the target is not visible• States at depth h
Reward function:• Target visible +1• Target not visible 0Maximizing the sum of rewards ~ maximizing escape time
R(state) = t
where 0 < < 1
discounting
![Page 19: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/19.jpg)
h-Step Planning Processh-Step Planning Process
Planning horizon h
The planner computes the optimal policy over this tree of states, but only the firststep of the policy is executed.
Then everything is repeated again …(sliding horizon)
![Page 20: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/20.jpg)
h-Step Planning Processh-Step Planning Process
Planning horizon h
![Page 21: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/21.jpg)
h-Step Planning Processh-Step Planning Process
Planning horizon h
h is chosen such that the computation of the optimal policy over the tree can be computed in one increment of time
![Page 22: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/22.jpg)
Example With No PlannerExample With No Planner
![Page 23: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/23.jpg)
Example With PlannerExample With Planner
![Page 24: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/24.jpg)
Other ExampleOther Example
![Page 25: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/25.jpg)
h-Step Planning Processh-Step Planning Process
Planning horizon h
The optimal policy over this tree is not theoptimal policy that would have been computedif a prior model of the environment had beenavailable, along with an arbitrarily fast computer
![Page 26: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/26.jpg)
0
GJO
GJO
GCR
GCR
1
1
2
3
4
30 1 2 3
i
![Page 27: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/27.jpg)
Simple Robot Navigation Simple Robot Navigation ProblemProblem
• In each state, the possible actions are U, D, R, and L
![Page 28: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/28.jpg)
Probabilistic Transition Probabilistic Transition ModelModel
• In each state, the possible actions are U, D, R, and L• The effect of U is as follows (transition model):
• With probability 0.8 the robot moves up one square (if the robot is already in the top row, then it does not move)
![Page 29: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/29.jpg)
Probabilistic Transition Probabilistic Transition ModelModel
• In each state, the possible actions are U, D, R, and L• The effect of U is as follows (transition model):
• With probability 0.8 the robot moves up one square (if the robot is already in the top row, then it does not move)• With probability 0.1 the robot moves right one square (if the robot is already in the rightmost row, then it does not move)
![Page 30: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/30.jpg)
Probabilistic Transition Probabilistic Transition ModelModel
• In each state, the possible actions are U, D, R, and L• The effect of U is as follows (transition model):
• With probability 0.8 the robot moves up one square (if the robot is already in the top row, then it does not move)• With probability 0.1 the robot moves right one square (if the robot is already in the rightmost row, then it does not move)• With probability 0.1 the robot moves left one square (if the robot is already in the leftmost row, then it does not move)
![Page 31: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/31.jpg)
Markov PropertyMarkov Property
The transition properties depend only on the current state, not on previous history (how that state was reached)
![Page 32: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/32.jpg)
Sequence of ActionsSequence of Actions
• Planned sequence of actions: (U, R)• U is executed
2
3
1
4321
![Page 33: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/33.jpg)
Sequence of ActionsSequence of Actions
• Planned sequence of actions: (U, R)• U is executed
2
3
1
4321
![Page 34: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/34.jpg)
Sequence of ActionsSequence of Actions
• Planned sequence of actions: (U, R)• U is executed• R is executed
2
3
1
4321
![Page 35: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/35.jpg)
Sequence of ActionsSequence of Actions
• Planned sequence of actions: (U, R)• U is executed• R is executed
2
3
1
4321
![Page 36: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/36.jpg)
Sequence of ActionsSequence of Actions
• Planned sequence of actions: (U, R)
2
3
1
4321
[3,2]
![Page 37: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/37.jpg)
Sequence of ActionsSequence of Actions
• Planned sequence of actions: (U, R)• U is executed
2
3
1
4321
[3,2]
[4,2][3,3][3,2]
![Page 38: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/38.jpg)
HistoriesHistories
• Planned sequence of actions: (U, R)• U has been executed• R is executed
• There are 9 possible sequences of states – called histories – and 6 possible final states for the robot!
4321
2
3
1
[3,2]
[4,2][3,3][3,2]
[3,3][3,2] [4,1] [4,2] [4,3][3,1]
![Page 39: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/39.jpg)
Probability of Reaching the Probability of Reaching the GoalGoal
•P([4,3] | (U,R).[3,2]) = P([4,3] | R.[3,3]) x P([3,3] | U.[3,2]) + P([4,3] | R.[4,2]) x P([4,2] | U.[3,2])
2
3
1
4321
Note importance of Markov property in this derivation
•P([3,3] | U.[3,2]) = 0.8•P([4,2] | U.[3,2]) = 0.1
•P([4,3] | R.[3,3]) = 0.8•P([4,3] | R.[4,2]) = 0.1
•P([4,3] | (U,R).[3,2]) = 0.65
![Page 40: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/40.jpg)
Utility FunctionUtility Function
• The robot needs to recharge its batteries• [4,3] provides power supply• [4,2] is a sand area from which the robot cannot escape• [4,3] or [4,2] are terminal states• Reward of a terminal state: +1 or -1• Reward of a non-terminal state: -1/25• Utility of a history: sum of rewards of traversed states• Goal: Maximize the utility of the history
-1
+1
2
3
1
4321
![Page 41: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/41.jpg)
-1
+1
2
3
1
4321
Histories are potentially unbounded and the same state can be reached many times
![Page 42: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/42.jpg)
Utility of an Action Utility of an Action SequenceSequence
-1
+1
• Consider the action sequence (U,R) from [3,2]
2
3
1
4321
![Page 43: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/43.jpg)
Utility of an Action Utility of an Action SequenceSequence
-1
+1
• Consider the action sequence (U,R) from [3,2]• A run produces one among 7 possible histories, each with some probability
2
3
1
4321
[3,2]
[4,2][3,3][3,2]
[3,3][3,2] [4,1] [4,2] [4,3][3,1]
![Page 44: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/44.jpg)
Utility of an Action Utility of an Action SequenceSequence
-1
+1
• Consider the action sequence (U,R) from [3,2]• A run produces one among 7 possible histories, each with some probability• The utility of the sequence is the expected utility of the histories:
U = hUh P(h)
2
3
1
4321
[3,2]
[4,2][3,3][3,2]
[3,3][3,2] [4,1] [4,2] [4,3][3,1]
![Page 45: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/45.jpg)
Optimal Action SequenceOptimal Action Sequence
-1
+1
• Consider the action sequence (U,R) from [3,2]• A run produces one among 7 possible histories, each with some probability• The utility of the sequence is the expected utility of the histories• The optimal sequence is the one with maximal utility
2
3
1
4321
[3,2]
[4,2][3,3][3,2]
[3,3][3,2] [4,1] [4,2] [4,3][3,1]
![Page 46: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/46.jpg)
Optimal Action SequenceOptimal Action Sequence
-1
+1
• Consider the action sequence (U,R) from [3,2]• A run produces one among 7 possible histories, each with some probability• The utility of the sequence is the expected utility of the histories• The optimal sequence is the one with maximal utility• But is the optimal action sequence what we want to compute?
2
3
1
4321
[3,2]
[4,2][3,3][3,2]
[3,3][3,2] [4,1] [4,2] [4,3][3,1]
NO!! Except if sequence is executed blindly (open-loop strategy)
![Page 47: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/47.jpg)
observable stateRepeat: s sensed state If s is terminal then exit a choose action (given s) Perform a
Reactive Agent AlgorithmReactive Agent Algorithm
![Page 48: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/48.jpg)
Utility of state i:
U(i) = maxa (kP(k | a.i) U(k) – Ca)
Best choice of action at state i:
*(i) = arg maxa (kP(k|a.i)U(k)–Ca)0 1 2 3
i
![Page 49: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/49.jpg)
Policy Policy (Reactive/Closed-Loop (Reactive/Closed-Loop Strategy)Strategy)
• A policy is a complete mapping from states to actions
-1
+1
2
3
1
4321
![Page 50: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/50.jpg)
Repeat: s sensed state If s is terminal then exit a (s) Perform a
Reactive Agent AlgorithmReactive Agent Algorithm
![Page 51: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/51.jpg)
Optimal PolicyOptimal Policy
-1
+1
• A policy is a complete mapping from states to actions• The optimal policy * is the one that always yields a history (ending at a terminal state) with maximal expected utility
2
3
1
4321
Makes sense because of Markov property
Note that [3,2] is a “dangerous” state that the optimal policy
tries to avoid
![Page 52: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/52.jpg)
Optimal PolicyOptimal Policy
-1
+1
• A policy is a complete mapping from states to actions• The optimal policy * is the one that always yields a history with maximal expected utility
2
3
1
4321
This problem is called aMarkov Decision Problem (MDP)
How to compute *?
![Page 53: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/53.jpg)
Utility of state i:
U(i) = maxa (kP(k | a.i) U(k) – Ca)
Best choice of action at state i:
*(i) = arg maxa (kP(k|a.i)U(k)–Ca)0 1 2 3
i
![Page 54: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/54.jpg)
The trick used in target-The trick used in target-tracking (indexing state by tracking (indexing state by time) can be applied …time) can be applied …
-1
+1
… but would yield a large tree + sub-optimal policy
![Page 55: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/55.jpg)
First-Step AnalysisFirst-Step Analysis
Simulate one step:
U(i) = R(i) + maxa kP(k | a.i) U(k)
*(i) = arg maxa kP(k | a.i) U(k)
( Principle of Max Expected Utility)
-1
+1
?
![Page 56: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/56.jpg)
What is the Difference?What is the Difference?
*(i) = arg maxa kP(k | a.i) U(k)
U(i) = R(i) + maxa kP(k | a.i) U(k)
0 1 2 3
-1
+1
![Page 57: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/57.jpg)
Value IterationValue IterationInitialize the utility of each non-terminal state si
to U0(i) = 0
For t = 0, 1, 2, …, do:
Ut+1(i) R(i) + maxa kP(k | a.i) Ut(k)
-1
+1
2
3
1
4321
![Page 58: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/58.jpg)
Value IterationValue IterationInitialize the utility of each non-terminal state si
to U0(i) = 0
For t = 0, 1, 2, …, do:
Ut+1(i) R(i) + maxa kP(k | a.i) Ut(k)Ut([3,1])
t0 302010
0.6110.5
0-1
+1
2
3
1
4321
0.705 0.655 0.3880.611
0.762
0.812 0.868 0.918
0.660
Note the importanceof terminal states andconnectivity of thestate-transition graph
Not very different from indexing states by time
![Page 59: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/59.jpg)
Policy IterationPolicy Iteration
Pick a policy at random
![Page 60: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/60.jpg)
Policy IterationPolicy Iteration
Pick a policy at random Repeat: Compute the utility of each state for
U (i) = R(i) + kP(k | (i).i) U (k)Set of linear equations
(often a sparse system)
![Page 61: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/61.jpg)
Policy IterationPolicy Iteration
Pick a policy at random Repeat: Compute the utility of each state for
U (i) = R(i) + kP(k | (i).i) U (k) Compute the policy ’ given these
utilities
’(i) = arg maxa kP(k | a.i) U(k)
If ’ = then return else’
![Page 62: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/62.jpg)
Application of First Step Application of First Step Analysis Analysis Computing the Probability of Computing the Probability of Folding pFolding pfoldfold of a Proteinof a Protein
Unfolded state Folded state
pfold1- pfold
HIV integrase
![Page 63: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/63.jpg)
Computation Through Computation Through SimulationSimulation
10K to 30K independent simulations
![Page 64: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/64.jpg)
Capture the stochastic nature of Capture the stochastic nature of molecular molecular motion by a probabilistic roadmapmotion by a probabilistic roadmap
vi
vj
Pij
![Page 65: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/65.jpg)
Edge probabilitiesEdge probabilities
Follow Metropolis criteria:
otherwise. ,
1
;0 if ,)/exp(
i
iji
Bij
ij
N
EN
TkE
P
Self-transition probability:
ijijii PP 1
vj
vi
Pij
Pii
![Page 66: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/66.jpg)
Pii
F: Folded setU: Unfolded set
First-Step AnalysisFirst-Step Analysis
Pij
i
k
j
l
m
Pik Pil
Pim
Let fi = pfold(i)After one step: fi = Pii fi + Pij fj + Pik fk + Pil fl + Pim fm
=1 =1
One linear equation per node Solution gives pfold for all nodes
No explicit simulation run All pathways are taken into
account Sparse linear system
![Page 67: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/67.jpg)
Partially Observable Partially Observable Markov Decision ProblemMarkov Decision Problem
• Uncertainty in sensing:
A sensing operation returns multiple states, with a probability distribution
![Page 68: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/68.jpg)
Example: Target TrackingExample: Target Tracking
There is uncertaintyin the robot’s and target’s positions, and this uncertaintygrows with further motion
There is a risk that the target escape behind the corner requiring the robot to move appropriately
But there is a positioninglandmark nearby. Shouldthe robot tries to reduceposition uncertainty?
![Page 69: Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect. 17.1-3,Chap. 17 CS121 – Winter 2003.](https://reader035.fdocuments.net/reader035/viewer/2022062803/56649f4a5503460f94c6bdce/html5/thumbnails/69.jpg)
SummarySummary
Probabilistic uncertainty Utility function Optimal policy Maximal expected utility Value iteration Policy iteration