Computer Science CPSC 502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)
-
Upload
melodie-foreman -
Category
Documents
-
view
34 -
download
2
description
Transcript of Computer Science CPSC 502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)
Computer Science CPSC 502
Lecture 12
Decisions Under Uncertainty
(Ch. 9, up to 9.3)
Representational Dimensions
• Environment
Problem Type
Query
Planning
Deterministic Stochastic
Constraint Satisfaction Search
Arc Consistency
Search
Search
Logics
STRIPS
Vars + Constraints
Value Iteration
Variable Elimination
Belief Nets
Decision Nets
Markov Processes
Static
Sequential
Representation
ReasoningTechnique
Variable Elimination
Approximate
Temporal Inference
This concludes the module on answering queries in stochastic environments
Representational Dimensions
• Environment
Problem Type
Query
Planning
Deterministic Stochastic
Constraint Satisfaction Search
Arc Consistency
Search
Search
Logics
STRIPS
Vars + Constraints
Value Iteration
Variable Elimination
Belief Nets
Decision Nets
Markov Processes
Static
Sequential
Representation
ReasoningTechnique
Variable Elimination
Approximate
Temporal Inference
Now we will look at acting in stochastic environments
Lecture Overview
• Single-Stage Decision Problems– Utilities and optimal secisions– Single-Stage decision networks– Variable elimination (VE) for computing the optimal decision
• Sequential Decision Problems– General decision networks– Policies– Finding optimal policies with VE
4
Decisions Under Uncertainty: Intro• Earlier in the course, we focused on decision making in
deterministic domains – Search/CSPs: single-stage decisions– Planning: sequential decisions
• Now we face stochastic domains– so far we've considered how to represent and update beliefs– what if an agent has to make decisions (act) under uncertainty?
• Making decisions under uncertainty is important– We represent the world probabilistically so we can use our beliefs as the
basis for making decisions
5
Decisions Under Uncertainty: Intro• An agent's decision will depend on
– What actions are available– What beliefs the agent has– Which goals the agent has
• Differences between deterministic and stochastic setting– Obvious difference in representation: need to represent our uncertain
beliefs– Actions will be pretty straightforward: represented as decision variables– Goals will be interesting: we'll move from all-or-nothing goals to a richer
notion: • rating how happy the agent is in different situations.
– Putting these together, we'll extend Bayesian Networks to make a new representation called Decision Networks
6
Delivery Robot Example• Robot needs to reach a certain room
• Robot can go
• the short way - faster but with more obstacles, thus more prone to accidents that can damage the robot
• the long way - slower but less prone to accident• Which way to go? Is it more important for the robot to arrive fast, or to minimize
the risk of damage?• The Robot can choose to wear pads to protect itself in case of accident, or not
to wear them. Pads slow it down• Again, there is a tradeoff between reducing risk of damage and arriving fast
• Possible outcomes
• No pad, no accident
• Pad, no accident
• Pad, Accident
• No pad, accident
Next• We’ll see how to represent and reason about situations of this
nature using Decision Trees, as well as
• Probability to measure the uncertainty in action outcome
• Utility to measure agent’s preferences over the various outcomes
• Combined in a measure of expected utility that can be used to identify the action with the best expected outcome
• Best that an intelligent agent can do when it needs to act in a stochastic environment
Decision Tree for the Delivery Robot Example• Decision variable 1: the robot can choose to wear pads
– Yes: protection against accidents, but extra weight– No: fast, but no protection
• Decision variable 2: the robot can choose the way– Short way: quick, but higher chance of accident– Long way: safe, but slow
• Random variable: is there an accident?
Agent decides
Chance decides
9
Delivery Robot Example• Decision variable 1: the robot can choose to wear pads
– Yes: protection against accidents, but extra weight– No: fast, but no protection
• Decision variable 2: the robot can choose the way– Short way: quick, but higher chance of accident– Long way: safe, but slow
• Random variable: is there an accident?
Agent decides
Chance decides
10
Possible worlds and decision variables• A possible world specifies a value for each random variable and each decision
variable• For each assignment of values to all decision variables
– the probabilities of the worlds satisfying that assignment sum to 1.
0.2
0.8
11
Possible worlds and decision variables
0.01
0.99
0.2
0.8
12
• A possible world specifies a value for each random variable and each decision variable
• For each assignment of values to all decision variables – the probabilities of the worlds satisfying that assignment sum to 1.
Possible worlds and decision variables
0.01
0.99
0.2
0.8
0.2
0.8
13
• A possible world specifies a value for each random variable and each decision variable
• For each assignment of values to all decision variables – the probabilities of the worlds satisfying that assignment sum to 1.
Possible worlds and decision variables
0.01
0.99
0.2
0.8
0.01
0.99
0.2
0.8
14
• A possible world specifies a value for each random variable and each decision variable
• For each assignment of values to all decision variables – the probabilities of the worlds satisfying that assignment sum to 1.
Utility• Utility: a measure of desirability of possible worlds to an agent
– Let U be a real-valued function such that U(w) represents an agent's degree of preference for world w
– Expressed by a number in [0,100]
• Simple goals can still be specified– Worlds that satisfy the goal have utility 100– Other worlds have utility 0
• Utilities can be more complicated– For example, in the robot delivery domains, they could involve
• Reached the target room?• Time taken• Amount of damage• Energy left
15
Utility for the Robot Example• Which would be a reasonable utility function for our robot?
• Which are the best and worst scenarios?
0.01
0.99
0.2
0.8
0.01
0.99
0.2
0.8
16
Utilityprobability
Utility / Preferences• Utility: a measure of desirability of possible worlds to an agent
– Let U be a real-valued function such that U (w) represents an agent's degree of preference for world w .
Would this be a reasonable utility function for our Robot?
Which way Accident Wear Pads
Utility World
short true trueshort false truelong true true long false trueshort true falseshort false falselong true falselong false false
3595 3075 31000 80
w0, moderate damagew1, reaches room, quick, extra weight w2, moderate damage, low energy w3, reaches room, slow, extra weight w4, severe damage w5, reaches room, quickw6, severe damage, low energy w7, reaches room, slow
Utility: Simple Goals
– Can simple (boolean) goals still be specified?
Which way Accident Wear Pads
Utility
long true true long true falselong false truelong false falseshort true trueshort true falseshort false trueshort false false
Slide 18
Utility for the Robot Example• Now, how do we combine utility and probability to decide what to do?
0.01
0.99
0.2
0.8
0.01
0.99
0.2
0.8
19
Utility
3535
95
probability
3530
75
353
100
350
80
Optimal decisions: combining Utility and Probability• Each set of decisions defines a probability distribution over
possible outcomes• Each outcome has a utility• For each set of decisions, we need to know their expected utility
– the value for the agent of achieving a certain probability distribution over outcomes (possible worlds)
• The expected utility of a set of decisions is obtained by • weighting the utility of the relevant possible worlds by their probability.
35
95
0.2
0.8
• We want to find the decision with maximum expected utility
value of this scenario?
Expected utility of a decision
0.01
0.99
0.2
0.8
0.01
0.99
0.2
0.8
21
Utility
3535
95
probability E[U|D]
3530
75
353
100
350
80
• The expected utility of decision D = di is
• What is the expected utility of Wearpads=yes, Way=short ?
E(U | D = di ) = w╞ (D = di )P(w)
U(w)
Expected utility of a decision
0.01
0.99
0.2
0.8
0.01
0.99
0.2
0.8
22
Utility
3535
95
probability E[U|D]
83
3530
75
353
100
350
80
74.55
80.6
79.2
• The expected utility of decision D = di is
• What is the expected utility of Wearpads=yes, Way=short ?0.2 * 35 + 0.8 * 95 = 83
E(U | D = di ) = w╞ (D = di )P(w)
U(w)
Lecture Overview
• Single-Stage Decision Problems– Utilities and optimal secisions– Single-Stage decision networks– Variable elimination (VE) for computing the optimal decision
• Sequential Decision Problems– General decision networks– Policies– Finding optimal policies with VE
23
Single Action vs. Sequence of Actions• Single Action (aka One-Off Decisions)
– One or more primitive decisions that can be treated as a single macro decision to be made before acting
– E.g., “WearPads” and “WhichWay” can be combined into macro decision (WearPads, WhichWay) with domain {yes,no} × {long, short}
• Sequence of Actions (Sequential Decisions)– Repeat:
• make observations• decide on an action• carry out the action
– Agent has to take actions not knowing what the future brings• This is fundamentally different from everything we’ve seen so far• Planning was sequential, but we still could still think first and then act
24
Optimal single-stage decision
• Given a single (macro) decision variable D– the agent can choose D=di for any value di dom(D)
25
What is the optimal decision in the example?
0.01
0.99
0.2
0.8
0.01
0.99
0.2
0.8
Utility
3535
95
Conditional probability E[U|D]
83
3530
75
353
100
350
80
74.55
80.6
79.2
Which is the optimal decision here?
Optimal decision in robot delivery example
0.01
0.99
0.2
0.8
0.01
0.99
0.2
0.8
Utility
3535
95
Conditional probability E[U|D]
83
3530
75
353
100
350
80
74.55
80.6
79.2
Best decision: (wear pads, short way)
Single-Stage decision networks
• Extend belief networks with:– Decision nodes, that the agent chooses the value for
• Parents: only other decision nodes allowed• Domain is the set of possible actions • Drawn as a rectangle
– Exactly one utility node• Parents: all random & decision variables on which the utility depends• Does not have a domain• Drawn as a diamond
• Explicitly shows dependencies– E.g., which variables affect the probability of an accident?
28
Types of nodes in decision networks• A random variable is drawn as an ellipse.
– Arcs into the node represent probabilistic dependence– As in Bayesian networks:
a random variable is conditionally independent of its non-descendants given its parents
• A decision variable is drawn as an rectangle. – Arcs into the node represent
information available when the decision is made
• A utility node is drawn as a diamond.– Arcs into the node represent variables that the utility
depends on.– Specifies a utility for each instantiation of its parents
Example Decision Network
Decision nodes do not have an associated table.
The utility node does not have a domain.
30
Which way Accident Wear Pads
Utility
long true true long true falselong false truelong false falseshort true trueshort true falseshort false trueshort false false
300 75 8035395 100
Which Way W
Accident A
P(A|W)
longlong shortshort
true false true false
0.01 0.99 0.2 0.8
Computing the optimal decision: we can use VE
• Denote– the random variables as X1, …, Xn
– the decision variables as D– the parents of node N as pa(N)
• To find the optimal decision we can use VE:1. Create a factor for each conditional probability and for the utility
2. Sum out all random variables, one at a time• This creates a factor on D that gives the expected utility for each di
3. Choose the di with the maximum value in the factor
nXX
n UpaUDXXPUE,...,
1
1
))(()|,...,()(
nXX
n
i,..., 11
31
Computing the optimal decision: we can use VE
• Denote– the random variables as X1, …, Xn
– the decision variables as D– the parents of node N as pa(N)
• To find the optimal decision we can use VE:1. Create a factor for each conditional probability and for the utility
2. Sum out all random variables, one at a time• This creates a factor on D that gives the expected utility for each di
3. Choose the di with the maximum value in the factor
nXX
n UpaUDXXPUE,...,
1
1
))(()|,...,()(
nXX
n
iii UpaUXpaXP
,..., 11
))(())(|(
32
Includes decision vars
VE Example: Step 1, create initial factors
33
Which way W Accident A Pads P Utility
long true true long true falselong false truelong false falseshort true trueshort true falseshort false trueshort false false
300 75 8035395 100
Which Way W
Accident A
P(A|W)
longlong shortshort
true false true false
0.01 0.99 0.2 0.8
f1(A,W)
f2(A,W,P)
A
PWAUWAPUE ),,()|()(
A
PWAfWAf ),,(),( 21
Abbreviations:
W = Which WayP = Wear PadsA = Accident
Which way W Accident A Pads P f(A,W,P)
long true true long true falselong false truelong false falseshort true trueshort true falseshort false trueshort false false
0.01 * 30
???
VE example: step 2, sum out A
34
Which way W Accident A Pads P
f2(A,W,P)
long true true long true falselong false truelong false falseshort true trueshort true falseshort false trueshort false false
300 75 8035395 100
Which way W
Accident A
f1(A,W)
longlong shortshort
true false true false
0.01 0.99 0.2 0.8
f (A=a,P=p,W=w) = f1(A=a,W=w) × f2(A=a,W=w,P=p)
Step 2a: compute product f(A,W,P) = f1(A,W) × f2(A,W,P)
Which way W Accident A Pads P f(A,W,P)
long true true long true falselong false truelong false falseshort true trueshort true falseshort false trueshort false false
0.01 * 300.01*00.99*750.99*800.2*350.2*30.8*95 0.8*100
VE example: step 2, sum out AStep 2a: compute product f(A,W,P) = f1(A,W) × f2(A,W,P)
35
Which way W Accident A Pads P
f2(A,W,P)
long true true long true falselong false truelong false falseshort true trueshort true falseshort false trueshort false false
300 75 8035395 100
f (A=a,P=p,W=w) = f1(A=a,W=w) × f2(A=a,W=w,P=p)Which way
WAccident A
f1(A,W)
longlong shortshort
true false true false
0.01 0.99 0.2 0.8
VE example: step 2, sum out A
36
Step 2b: sum A out of the product f(A,W,P):
Which way W
Pads P f3(W,P)
longlong shortshort
true false true false
A
3 )PW,A,(f P)(W,f
Which way W Accident A Pads P f(A,W,P)
long true true long true falselong false truelong false falseshort true trueshort true falseshort false trueshort false false
0.01 * 300.01*00.99*750.99*800.2*350.2*30.8*95 0.8*100
The final factor encodes the expected utility of each decision
VE example: step 2, sum out A
37
Which way W
Pads P f3(W,P)
longlong shortshort
true false true false
0.01*30+0.99*75=74.55
0.2*35+0.8*95=83
Which way W Accident A Pads P f(A,W,P)
long true true long true falselong false truelong false falseshort true trueshort true falseshort false trueshort false false
0.01 * 300.01*00.99*750.99*800.2*350.2*30.8*95 0.8*100
Step 2b: sum A out of the product f(A,W,P):
A3 )PW,A,(f P)(W,f
The final factor encodes the expected utility of each decision
Expected utility of a decision
0.01
0.99
0.2
0.8
0.01
0.99
0.2
0.8
38
Utility
3535
95
probability E[U|D]
83
3530
75
353
100
350
80
74.55
80.6
79.2
• The expected utility of decision D = di is
• What is the expected utility of Wearpads=yes, Way=short ?0.2 * 35 + 0.8 * 95 = 83
E(U | D = di ) = w╞ (D = di )P(w)
U(w)
VE example: step 2, sum out A
39
Which way W
Pads P f3(W,P)
longlong shortshort
true false true false
0.01*30+0.99*75=74.550.01*0+0.99*80=79.20.2*35+0.8*95=830.2*3+0.8*100=80.6
Which way W Accident A Pads P f(A,W,P)
long true true long true falselong false truelong false falseshort true trueshort true falseshort false trueshort false false
0.01 * 300.01*00.99*750.99*800.2*350.2*30.8*95 0.8*100
Step 2b: sum A out of the product f(A,W,P):
A3 )PW,A,(f P)(W,f
The final factor encodes the expected utility of each decision
The final factor encodes the expected utility of each decision• Thus, taking the short way but wearing pads is the best choice, with an
expected utility of 83
VE example: step 3, choose decision with max E(U)
40
Which way W
Pads P f3(W,P)
longlong shortshort
true false true false
0.01*30+0.99*75=74.550.01*0+0.99*80=79.20.2*35+0.8*95=830.2*3+0.8*100=80.6
Which way W Accident A Pads P f(A,W,P)
long true true long true falselong false truelong false falseshort true trueshort true falseshort false trueshort false false
0.01 * 300.01*00.99*750.99*800.2*350.2*30.8*95 0.8*100
Step 2b: sum A out of the product f(A,W,P):
A3 )PW,A,(f P)(W,f
Variable Elimination for Single-Stage Decision Networks: Summary
1. Create a factor for each conditional probability and for the utility
2. Sum out all random variables, one at a time– This creates a factor on D that gives the expected utility for each di
3. Choose the di with the maximum value in the factor
41
Lecture Overview
• Single-Stage Decision Problems– Utilities and optimal secisions– Single-Stage decision networks– Variable elimination (VE) for computing the optimal decision
• Sequential Decision Problems– General decision networks– Policies– Finding optimal policies with VE
42
Sequential Decision Problems• An intelligent agent doesn't make a multi-step decision
and carry it out blindly– It would take new observations it makes into account
• A more typical scenario:– The agent observes, acts, observes, acts, …
• Subsequent actions can depend on what is observed– What is observed often depends on previous actions– Often the sole reason for carrying out an action is to provide information
for future actions• For example: diagnostic tests, spying
• General Decision networks:– Just like single-stage decision networks, with one exception:
the parents of decision nodes can include random variables
43
Sequential decisions : Simplest possible
• Only one decision! (but different from one-off decisions)• Early in the morning. Shall I take my umbrella today? (I’ll have
to go for a long walk at noon)• Relevant Random Variables?
Slide 44
• Each decision Di has an information set of variables pa(Di), whose value will be known at the time decision Di is made– pa(CheckSmoke) = {Report}– pa(Call) = {Report, CheckSmoke, See Smoke}
Sequential Decision Problems: Example
Decision node: Agent decides
Chance node: Chance decides
• In our Fire Alarm domain• If there is a report you can decide to call the fire
department• Before doing that, you can decide to check if you can see
smoke, but this takes time and will delay calling• A decision (e.g. Call) can
depends on a random variable
(e.g. SeeSmoke )
Sequential Decision Problems• What should an agent do?
– The agent observes, acts, observes, acts, …– Subsequent actions can depend on what is observed
• What is observed often depends on previous actions
• The agent needs a conditional plan of what it will do given every possible set of circumstances
• We will formalize this conditional plan as a policy
46
Policies for Sequential Decision Problems
This policy means that when the agent has observed
o dom(pa(Di )) , it will do δi(o)
CheckSmoke
Report δcs1 δcs2 δcs3 δcs4
Definition (Policy)A policy is a sequence of δ1 ,….., δn decision functions
δi : dom(pa(Di )) → dom(Di)
There are 22=4 possible decision functions δcs
for Check Smoke: - Decision function needs to specify a value for
each instantiation of parents
Policies for Sequential Decision Problems
This policy means that when the agent has observed
o dom(pa(Di )) , it will do δi(o)
CheckSmoke
Report δcs1 δcs2 δcs3 δcs4
T T T F F
F T F T F
Definition (Policy)A policy is a sequence of δ1 ,….., δn decision functions
δi : dom(pa(Di )) → dom(Di)
There are 22=4 possible decision functions δcs
for Check Smoke: - Decision function needs to specify a value for
each instantiation of parents
R=t,CS=t,SS=t
R=t,CS=t,SS=f
R=t,CS=f,SS=t
R=t,CS=f,SS=f
R=f,CS=t,SS=t
R=f,CS=t,SS=f
R=f,CS=f,SS=t
R=f,CS=f,SS=f
δcall1(R) T T T T T T T T
δcall2(R) T T T T T T T F
δcall3(R) T T T T T T F T
δcall4(R) T T T T T T F F
δcall5(R) T T T T T F T T
… … … … … … … … …
δcall256(R)
F F F F F F F F
R=t,CS=t,SS=t
R=t,CS=t,SS=f
R=t,CS=f,SS=t
R=t,CS=f,SS=f
R=f,CS=t,SS=t
R=f,CS=t,SS=f
R=f,CS=f,SS=t
R=f,CS=f,SS=f
δcall1(R) T T T T T T T T
δcall2(R) T T T T T T T F
δcall3(R) T T T T T T F T
δcall4(R) T T T T T T F F
δcall5(R) T T T T T F T T
… … … … … … … … …
δcall256(R)
F F F F F F F F
R=t,CS=t,SS=t
R=t,CS=t,SS=f
R=t,CS=f,SS=t
R=t,CS=f,SS=f
R=f,CS=t,SS=t
R=f,CS=t,SS=f
R=f,CS=f,SS=t
R=f,CS=f,SS=f
δcall1(R) T T T T T T T T
δcall2(R) T T T T T T T F
δcall3(R) T T T T T T F T
δcall4(R) T T T T T T F F
δcall5(R) T T T T T F T T
… … … … … … … … …
δcall256(R)
F F F F F F F F
R=t,CS=t,SS=t
R=t,CS=t,SS=f
R=t,CS=f,SS=t
R=t,CS=f,SS=f
R=f,CS=t,SS=t
R=f,CS=t,SS=f
R=f,CS=f,SS=t
R=f,CS=f,SS=f
δcall1(R) T T T T T T T T
δcall2(R) T T T T T T T F
δcall3(R) T T T T T T F T
δcall4(R) T T T T T T F F
δcall5(R) T T T T T F T T
… … … … … … … … …
δcall256(R)
F F F F F F F F
Policies for Sequential Decision Problems
Definition (Policy)A policy is a sequence of δ1 ,….., δn decision functions
δi : dom(pa(Di )) → dom(Di)
There are possible decision functions δcs for Call:
when the agent has observed o dom(pDi ) , it will do δi(o)
49
R=t,CS=t,SS=t
R=t,CS=t,SS=f
R=t,CS=f,SS=t
R=t,CS=f,SS=f
R=f,CS=t,SS=t
R=f,CS=t,SS=f
R=f,CS=f,SS=t
R=f,CS=f,SS=f
δcall1(R) T T T T T T T T
δcall2(R) T T T T T T T F
δcall3(R) T T T T T T F T
δcall4(R) T T T T T T F F
δcall5(R) T T T T T F T T
… … … … … … … … …
δcall256(R)
F F F F F F F F
R=t,CS=t,SS=t
R=t,CS=t,SS=f
R=t,CS=f,SS=t
R=t,CS=f,SS=f
R=f,CS=t,SS=t
R=f,CS=t,SS=f
R=f,CS=f,SS=t
R=f,CS=f,SS=f
δcall1(R) T T T T T T T T
δcall2(R) T T T T T T T F
δcall3(R) T T T T T T F T
δcall4(R) T T T T T T F F
δcall5(R) T T T T T F T T
… … … … … … … … …
δcall256(R)
F F F F F F F F
R=t,CS=t,SS=t
R=t,CS=t,SS=f
R=t,CS=f,SS=t
R=t,CS=f,SS=f
R=f,CS=t,SS=t
R=f,CS=t,SS=f
R=f,CS=f,SS=t
R=f,CS=f,SS=f
δcall1(R) T T T T T T T T
δcall2(R) T T T T T T T F
δcall3(R) T T T T T T F T
δcall4(R) T T T T T T F F
δcall5(R) T T T T T F T T
… … … … … … … … …
δcall256(R)
F F F F F F F F
R=t,CS=t,SS=t
R=t,CS=t,SS=f
R=t,CS=f,SS=t
R=t,CS=f,SS=f
R=f,CS=t,SS=t
R=f,CS=t,SS=f
R=f,CS=f,SS=t
R=f,CS=f,SS=f
δcall1(R) T T T T T T T T
δcall2(R) T T T T T T T F
δcall3(R) T T T T T T F T
δcall4(R) T T T T T T F F
δcall5(R) T T T T T F T T
… … … … … … … … …
δcall256(R)
F F F F F F F F
Policies for Sequential Decision Problems
Definition (Policy)A policy is a sequence of δ1 ,….., δn decision functions
δi : dom(pa(Di )) → dom(Di)
There are 28=256 possible decision functions δcs for Call:
when the agent has observed o dom(pDi ) , it will do δi(o)
50
Policies for Sequential Decision Problems • Policies for sequential decisions are the counterpart of Single
Decisions for One-Off decisions– They are just much more complex, because they tell the agent what to do
for any step of the decision sequence given any possible combination of available information
Which way
Wear Pads
d1d2d3d4
longlong shortshort
true false true false
CheckSmoke
Report
δcs1 δcs2 δcs3 δcs4
T t t f f
F t f t f
Call
Report
CheckS
SeeS
δcall1 ……
δcallk δcalln
true TrueTrueTruefalsefalsefalsefalse
TrueTrueFalseFalseTrueTrueFalsefalse
TrueFalseTrueFalseTrueFalseTrueFalse
true false true falsetrue false false false
………….
true false true falsetrue false false false
true true true truetruetruetruetrue
Sample policy
Sample policy
Sample decision
How many policies are there?• If a decision D has k binary parents,
• there are 2k different assignments of values to the parents• If there are b possible value for a decision variable with k binary parents
• There are b2k different decision functions for that variable• because there are 2k possible instantiations for the parents and for every
instantiation of those parents, the decision function could pick any of b values
• If there are d decision variables, each with k binary parents and b possible values– There are different policies
52
How many policies are there?• If a decision D has k binary parents,
• there are 2k different assignments of values to the parents• If there are b possible value for a decision variable with k binary parents
• There are b2k different decision functions for that variable• because there are 2k possible instantiations for the parents and for every
instantiation of those parents, the decision function could pick any of b values
• If there are d decision variables, each with k binary parents and b possible values– There are (b2k)d different policies– because there are b2k possible decision functions for each decision, and a
policy is a combination of d such decision functions• We want to find the “most promising” policy given the inherent uncertainty,
i.e. the policy with the maximum expected utility– But of course we want to avoid enumerating all the policies: VE
53
Lecture Overview
• Single-Stage Decision Problems– Utilities and optimal secisions– Single-Stage decision networks– Variable elimination (VE) for computing the optimal decision
• Sequential Decision Problems– General decision networks– Policies– Finding optimal policies with VE
54
Expected Value of a Policy• Like One-Off decisions, policies are selected based on their
expected utility
• Each possible world w has a probability P(w) and a utility U(w)
• The expected utility of policy π is
• The optimal policy is one with the maximum expected utility.
∑ w ╞ π =P(w)*U(w)
Possible worlds that satisfy the policy
When does a possible world satisfy a policy?• Possible world w satisfies policy π , written w ╞ π if the value of
each decision variable is the value selected by its decision function in the policy (when applied in w).
Report Check Smoke
true false
true false
Report CheckSmoke SeeSmoke
Call
true true true true true falsetrue false truetrue false falsefalse true truefalse true falsefalse false truefalse false false
falsefalse true falsetrue false false false
VARs
Fire Tampering AlarmLeaving ReportSmoke SeeSmoke CheckSmoke Call
truefalse truetruetrue true truetruetrue
Decision function δcs2 …
Decision function δcall20
…
Policy π
w2 ╞ π ?
w2
One last operation on factors: maxing out a variable
• Maxing out a variable is similar to marginalization– But instead of taking the sum of some values, we take the max
B A C f3(A,B,C)
t t t 0.03
t t f 0.07
f t t 0.54
f t f 0.36
t f t 0.06
t f f 0.14
f f t 0.48
f f f 0.32
A C f4(A,C)
t t 0.54
t f 0.36
f t ?
f f
maxB f3(A,B,C) = f4(A,C)
),,,(max,,max 21)(2 11 jXdomxjX XXxXfXX
57
One last operation on factors: maxing out a variable
• Maxing out a variable is similar to marginalization– But instead of taking the sum of some values, we take the max
B A C f3(A,B,C)
t t t 0.03
t t f 0.07
f t t 0.54
f t f 0.36
t f t 0.06
t f f 0.14
f f t 0.48
f f f 0.32
A C f4(A,C)
t t 0.54
t f 0.36
f t 0.48
f f 0.32
maxB f3(A,B,C) = f4(A,C)
),,,(max,,max 21)(2 11 jXdomxjX XXxXfXX
58
The no-forgetting property
• A decision network has the no-forgetting property if– Decision variables are totally ordered: D1, …, Dm
– If a decision Di comes before Dj ,then • Di is a parent of Dj
• any parent of Di is a parent of Dj
59
Idea for finding optimal policies with VE• Idea for finding optimal policies with variable elimination (VE):
Dynamic programming: precompute optimal future decisions– Consider the last decision D to be made
• Find optimal decision D=d for each instantiation of D’s parents – For each instantiation of D’s parents, this is just a single-stage decision problem
• Create a factor of these maximum values: max out D– I.e., for each instantiation of the parents, what is the best utility I can achieve by making
this last decision optimally?• Recurse to find optimal policy for reduced network (now one less decision)
60
Finding optimal policies with VE1. Create a factor for each CPT and a factor for the utility
2. While there are still decision variables– 2a: Sum out random variables that are not parents of a decision node.
• E.g – 2b: Max out last decision variable D in the total ordering
• Keep track of decision function
3. Sum out any remaining variable: this is the expected utility of the optimal policy.
This is Algorithm VE_DN in P&M, Section 9.3.3, p. 393
61
Finding optimal policies with VE1. Create a factor for each CPT and a factor for the utility
2. While there are still decision variables– 2a: Sum out random variables that are not parents of a decision node.
• E.g Tampering, Fire, Alarm, Smoke, Leaving– 2b: Max out last decision variable D in the total ordering
• Keep track of decision function
3. Sum out any remaining variable: this is the expected utility of the optimal policy.
This is Algorithm VE_DN in P&M, Section 9.3.3, p. 393
62
Eliminate the decision Variables: step3 details• Select a variable D that corresponds to the latest decision to be made
– this variable will appear in only one factor with (some of) its parents
• Eliminate D by maximizing. This returns:
– The optimal decision function for D, arg maxD f
– A new factor to use in VE, maxD f
• Repeat till there are no more decision nodes.
Report CheckSmoke Value
true truetrue falsefalse truefalse false
-5.0 -5.6 -23.7 -17.5
Example: Eliminate CheckSmoke
Report CheckSmoke
true false
Report
Value
truefalse
New factor
Decision Function
Computational complexity of VE for finding optimal policies
• We saw:For d decision variables (each with k binary parents and b possible actions), there are (b2k)d policies– All combinations of (b2k) decision functions per decision
• Variable elimination saves the final exponent:– Dynamic programming: consider each decision functions only once– Resulting complexity: O(d * b2k)– Much faster than enumerating policies (or search in policy space), but still
doubly exponential– Solution: approximation algorithms for finding optimal policies
64
Value of Information
• What would help the agent make a better Umbrella decision?
• The value of information of a random variable X for decision D is:• the utility of the network with an arc from X to D minus the
utility of the network without the arc.
Slide 66
Value of Information (cont.)• The value of information provides a bound on how much you should be prepared to pay for a sensor. How much is a perfect weather forecast worth?
• Original maximum expected utility:• Maximum expected utility when we know
Weather:• Better forecast is worth at most:
77
91
Value of Information• The value of information provides a bound on how much you
should be prepared to pay for a sensor. How much is a perfect fire sensor worth?
• Original maximum expected utility:• Maximum expected utility when we
know Fire:• Perfect fire sensor is worth:
-22.6
-2
• One-Off decisions– Compare and contrast stochastic single-stage (one-off) decisions vs. multistage (sequential)
decisions– Define a Utility Function on possible worlds– Define and compute optimal one-off decisions– Represent one-off decisions as single stage decision networks – Compute optimal decisions by Variable Elimination
• Sequential decision networks– Represent sequential decision problems as decision networks– Explain the non forgetting property
• Policies– Verify whether a possible world satisfies a policy– Define the expected utility of a policy – Compute the number of policies for a decision problem– Compute the optimal policy by Variable Elimination
• Define and compute value of information
68
Learning Goals For Decision Under Uncertainty