Computer Science CPSC 502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Computer Science CPSC 502

Lecture 12

Decisions Under Uncertainty

(Ch. 9, up to 9.3)

Representational Dimensions

• Environment

Problem Type

Query

Planning

Deterministic Stochastic

Constraint Satisfaction Search

Arc Consistency

Search

Search

Logics

STRIPS

Vars + Constraints

Value Iteration

Variable Elimination

Belief Nets

Decision Nets

Markov Processes

Static

Sequential

Representation

ReasoningTechnique


Approximate

Temporal Inference

This concludes the module on answering queries in stochastic environments

Representational Dimensions

• Environment

Problem Type

Query

Planning

Deterministic Stochastic

Constraint Satisfaction Search

Arc Consistency

Search

Search

Logics

STRIPS

Vars + Constraints

Value Iteration


Belief Nets

Decision Nets

Markov Processes

Static

Sequential

Representation

ReasoningTechnique


Approximate

Temporal Inference

Now we will look at acting in stochastic environments

Lecture Overview

• Single-Stage Decision Problems– Utilities and optimal secisions– Single-Stage decision networks– Variable elimination (VE) for computing the optimal decision

• Sequential Decision Problems– General decision networks– Policies– Finding optimal policies with VE

4

Decisions Under Uncertainty: Intro• Earlier in the course, we focused on decision making in

deterministic domains – Search/CSPs: single-stage decisions– Planning: sequential decisions

• Now we face stochastic domains– so far we've considered how to represent and update beliefs– what if an agent has to make decisions (act) under uncertainty?

• Making decisions under uncertainty is important– We represent the world probabilistically so we can use our beliefs as the

basis for making decisions

5

Decisions Under Uncertainty: Intro• An agent's decision will depend on

– What actions are available– What beliefs the agent has– Which goals the agent has

• Differences between deterministic and stochastic setting– Obvious difference in representation: need to represent our uncertain

beliefs– Actions will be pretty straightforward: represented as decision variables– Goals will be interesting: we'll move from all-or-nothing goals to a richer

notion: • rating how happy the agent is in different situations.

– Putting these together, we'll extend Bayesian Networks to make a new representation called Decision Networks

6

Delivery Robot Example• Robot needs to reach a certain room

• Robot can go

• the short way - faster but with more obstacles, thus more prone to accidents that can damage the robot

• the long way - slower but less prone to accident• Which way to go? Is it more important for the robot to arrive fast, or to minimize

the risk of damage?• The Robot can choose to wear pads to protect itself in case of accident, or not

to wear them. Pads slow it down• Again, there is a tradeoff between reducing risk of damage and arriving fast

• Possible outcomes

• No pad, no accident

• Pad, no accident

• Pad, Accident

• No pad, accident

Next• We’ll see how to represent and reason about situations of this

nature using Decision Trees, as well as

• Probability to measure the uncertainty in action outcome

• Utility to measure agent’s preferences over the various outcomes

• Combined in a measure of expected utility that can be used to identify the action with the best expected outcome

• Best that an intelligent agent can do when it needs to act in a stochastic environment

Decision Tree for the Delivery Robot Example• Decision variable 1: the robot can choose to wear pads

– Yes: protection against accidents, but extra weight– No: fast, but no protection

• Decision variable 2: the robot can choose the way– Short way: quick, but higher chance of accident– Long way: safe, but slow

• Random variable: is there an accident?

Agent decides

Chance decides

9

Delivery Robot Example• Decision variable 1: the robot can choose to wear pads

– Yes: protection against accidents, but extra weight– No: fast, but no protection

• Decision variable 2: the robot can choose the way– Short way: quick, but higher chance of accident– Long way: safe, but slow

• Random variable: is there an accident?

Agent decides

Chance decides

10

Possible worlds and decision variables• A possible world specifies a value for each random variable and each decision

variable• For each assignment of values to all decision variables

– the probabilities of the worlds satisfying that assignment sum to 1.

0.2

0.8

11

Possible worlds and decision variables

0.01

0.99

0.2

0.8

12

• A possible world specifies a value for each random variable and each decision variable

• For each assignment of values to all decision variables – the probabilities of the worlds satisfying that assignment sum to 1.


0.01

0.99

0.2

0.8

0.2

0.8

13




0.01

0.99

0.2

0.8

0.01

0.99

0.2

0.8

14



Utility• Utility: a measure of desirability of possible worlds to an agent

– Let U be a real-valued function such that U(w) represents an agent's degree of preference for world w

– Expressed by a number in [0,100]

• Simple goals can still be specified– Worlds that satisfy the goal have utility 100– Other worlds have utility 0

• Utilities can be more complicated– For example, in the robot delivery domains, they could involve

• Reached the target room?• Time taken• Amount of damage• Energy left

15

Utility for the Robot Example• Which would be a reasonable utility function for our robot?

• Which are the best and worst scenarios?

0.01

0.99

0.2

0.8

0.01

0.99

0.2

0.8

16

Utilityprobability

Utility / Preferences• Utility: a measure of desirability of possible worlds to an agent

– Let U be a real-valued function such that U (w) represents an agent's degree of preference for world w .

Would this be a reasonable utility function for our Robot?

Which way Accident Wear Pads

Utility World

short true trueshort false truelong true true long false trueshort true falseshort false falselong true falselong false false

3595 3075 31000 80

w0, moderate damagew1, reaches room, quick, extra weight w2, moderate damage, low energy w3, reaches room, slow, extra weight w4, severe damage w5, reaches room, quickw6, severe damage, low energy w7, reaches room, slow

Utility: Simple Goals

– Can simple (boolean) goals still be specified?


Utility

long true true long true falselong false truelong false falseshort true trueshort true falseshort false trueshort false false

Slide 18

Utility for the Robot Example• Now, how do we combine utility and probability to decide what to do?

0.01

0.99

0.2

0.8

0.01

0.99

0.2

0.8

19

Utility

3535

95

probability

3530

75

353

100

350

80

Optimal decisions: combining Utility and Probability• Each set of decisions defines a probability distribution over

possible outcomes• Each outcome has a utility• For each set of decisions, we need to know their expected utility

– the value for the agent of achieving a certain probability distribution over outcomes (possible worlds)

• The expected utility of a set of decisions is obtained by • weighting the utility of the relevant possible worlds by their probability.

35

95

0.2

0.8

• We want to find the decision with maximum expected utility

value of this scenario?

Expected utility of a decision

0.01

0.99

0.2

0.8

0.01

0.99

0.2

0.8

21

Utility

3535

95

probability E[U|D]

3530

75

353

100

350

80

• The expected utility of decision D = di is

• What is the expected utility of Wearpads=yes, Way=short ?

E(U | D = di ) = w╞ (D = di )P(w)

U(w)


0.01

0.99

0.2

0.8

0.01

0.99

0.2

0.8

22

Utility

3535

95

probability E[U|D]

83

3530

75

353

100

350

80

74.55

80.6

79.2


• What is the expected utility of Wearpads=yes, Way=short ?0.2 * 35 + 0.8 * 95 = 83

E(U | D = di ) = w╞ (D = di )P(w)

U(w)

Lecture Overview



23

Single Action vs. Sequence of Actions• Single Action (aka One-Off Decisions)

– One or more primitive decisions that can be treated as a single macro decision to be made before acting

– E.g., “WearPads” and “WhichWay” can be combined into macro decision (WearPads, WhichWay) with domain {yes,no} × {long, short}

• Sequence of Actions (Sequential Decisions)– Repeat:

• make observations• decide on an action• carry out the action

– Agent has to take actions not knowing what the future brings• This is fundamentally different from everything we’ve seen so far• Planning was sequential, but we still could still think first and then act

24

Optimal single-stage decision

• Given a single (macro) decision variable D– the agent can choose D=di for any value di dom(D)

25

What is the optimal decision in the example?

0.01

0.99

0.2

0.8

0.01

0.99

0.2

0.8

Utility

3535

95

Conditional probability E[U|D]

83

3530

75

353

100

350

80

74.55

80.6

79.2

Which is the optimal decision here?

Optimal decision in robot delivery example

0.01

0.99

0.2

0.8

0.01

0.99

0.2

0.8

Utility

3535

95

Conditional probability E[U|D]

83

3530

75

353

100

350

80

74.55

80.6

79.2

Best decision: (wear pads, short way)

Single-Stage decision networks

• Extend belief networks with:– Decision nodes, that the agent chooses the value for

• Parents: only other decision nodes allowed• Domain is the set of possible actions • Drawn as a rectangle

– Exactly one utility node• Parents: all random & decision variables on which the utility depends• Does not have a domain• Drawn as a diamond

• Explicitly shows dependencies– E.g., which variables affect the probability of an accident?

28

Types of nodes in decision networks• A random variable is drawn as an ellipse.

– Arcs into the node represent probabilistic dependence– As in Bayesian networks:

a random variable is conditionally independent of its non-descendants given its parents

• A decision variable is drawn as an rectangle. – Arcs into the node represent

information available when the decision is made

• A utility node is drawn as a diamond.– Arcs into the node represent variables that the utility

depends on.– Specifies a utility for each instantiation of its parents

Example Decision Network

Decision nodes do not have an associated table.

The utility node does not have a domain.

30


Utility


300 75 8035395 100

Which Way W

Accident A

P(A|W)

longlong shortshort

true false true false

0.01 0.99 0.2 0.8

Computing the optimal decision: we can use VE

• Denote– the random variables as X1, …, Xn

– the decision variables as D– the parents of node N as pa(N)

• To find the optimal decision we can use VE:1. Create a factor for each conditional probability and for the utility

2. Sum out all random variables, one at a time• This creates a factor on D that gives the expected utility for each di

3. Choose the di with the maximum value in the factor

nXX

n UpaUDXXPUE,...,

1

1

))(()|,...,()(

nXX

n

i,..., 11

31

Computing the optimal decision: we can use VE

• Denote– the random variables as X1, …, Xn

– the decision variables as D– the parents of node N as pa(N)

• To find the optimal decision we can use VE:1. Create a factor for each conditional probability and for the utility

2. Sum out all random variables, one at a time• This creates a factor on D that gives the expected utility for each di


nXX

n UpaUDXXPUE,...,

1

1

))(()|,...,()(

nXX

n

iii UpaUXpaXP

,..., 11

))(())(|(

32

Includes decision vars

VE Example: Step 1, create initial factors

33

Which way W Accident A Pads P Utility


300 75 8035395 100

Which Way W

Accident A

P(A|W)

longlong shortshort


0.01 0.99 0.2 0.8

f1(A,W)

f2(A,W,P)

A

PWAUWAPUE ),,()|()(

A

PWAfWAf ),,(),( 21

Abbreviations:

W = Which WayP = Wear PadsA = Accident

Which way W Accident A Pads P f(A,W,P)


0.01 * 30

???

VE example: step 2, sum out A

34

Which way W Accident A Pads P

f2(A,W,P)


300 75 8035395 100

Which way W

Accident A

f1(A,W)

longlong shortshort


0.01 0.99 0.2 0.8

f (A=a,P=p,W=w) = f1(A=a,W=w) × f2(A=a,W=w,P=p)

Step 2a: compute product f(A,W,P) = f1(A,W) × f2(A,W,P)



0.01 * 300.01*00.99*750.99*800.2*350.2*30.8*95 0.8*100

VE example: step 2, sum out AStep 2a: compute product f(A,W,P) = f1(A,W) × f2(A,W,P)

35

Which way W Accident A Pads P

f2(A,W,P)


300 75 8035395 100

f (A=a,P=p,W=w) = f1(A=a,W=w) × f2(A=a,W=w,P=p)Which way

WAccident A

f1(A,W)

longlong shortshort


0.01 0.99 0.2 0.8


36

Step 2b: sum A out of the product f(A,W,P):

Which way W

Pads P f3(W,P)

longlong shortshort


A

3 )PW,A,(f P)(W,f



0.01 * 300.01*00.99*750.99*800.2*350.2*30.8*95 0.8*100

The final factor encodes the expected utility of each decision


37

Which way W

Pads P f3(W,P)

longlong shortshort


0.01*30+0.99*75=74.55

0.2*35+0.8*95=83



0.01 * 300.01*00.99*750.99*800.2*350.2*30.8*95 0.8*100


A3 )PW,A,(f P)(W,f



0.01

0.99

0.2

0.8

0.01

0.99

0.2

0.8

38

Utility

3535

95

probability E[U|D]

83

3530

75

353

100

350

80

74.55

80.6

79.2


• What is the expected utility of Wearpads=yes, Way=short ?0.2 * 35 + 0.8 * 95 = 83

E(U | D = di ) = w╞ (D = di )P(w)

U(w)


39

Which way W

Pads P f3(W,P)

longlong shortshort


0.01*30+0.99*75=74.550.01*0+0.99*80=79.20.2*35+0.8*95=830.2*3+0.8*100=80.6



0.01 * 300.01*00.99*750.99*800.2*350.2*30.8*95 0.8*100


A3 )PW,A,(f P)(W,f


The final factor encodes the expected utility of each decision• Thus, taking the short way but wearing pads is the best choice, with an

expected utility of 83

VE example: step 3, choose decision with max E(U)

40

Which way W

Pads P f3(W,P)

longlong shortshort


0.01*30+0.99*75=74.550.01*0+0.99*80=79.20.2*35+0.8*95=830.2*3+0.8*100=80.6



0.01 * 300.01*00.99*750.99*800.2*350.2*30.8*95 0.8*100


A3 )PW,A,(f P)(W,f

Variable Elimination for Single-Stage Decision Networks: Summary

1. Create a factor for each conditional probability and for the utility

2. Sum out all random variables, one at a time– This creates a factor on D that gives the expected utility for each di


41

Lecture Overview



42

Sequential Decision Problems• An intelligent agent doesn't make a multi-step decision

and carry it out blindly– It would take new observations it makes into account

• A more typical scenario:– The agent observes, acts, observes, acts, …

• Subsequent actions can depend on what is observed– What is observed often depends on previous actions– Often the sole reason for carrying out an action is to provide information

for future actions• For example: diagnostic tests, spying

• General Decision networks:– Just like single-stage decision networks, with one exception:

the parents of decision nodes can include random variables

43

Sequential decisions : Simplest possible

• Only one decision! (but different from one-off decisions)• Early in the morning. Shall I take my umbrella today? (I’ll have

to go for a long walk at noon)• Relevant Random Variables?

Slide 44

• Each decision Di has an information set of variables pa(Di), whose value will be known at the time decision Di is made– pa(CheckSmoke) = {Report}– pa(Call) = {Report, CheckSmoke, See Smoke}

Sequential Decision Problems: Example

Decision node: Agent decides

Chance node: Chance decides

• In our Fire Alarm domain• If there is a report you can decide to call the fire

department• Before doing that, you can decide to check if you can see

smoke, but this takes time and will delay calling• A decision (e.g. Call) can

depends on a random variable

(e.g. SeeSmoke )

Sequential Decision Problems• What should an agent do?

– The agent observes, acts, observes, acts, …– Subsequent actions can depend on what is observed

• What is observed often depends on previous actions

• The agent needs a conditional plan of what it will do given every possible set of circumstances

• We will formalize this conditional plan as a policy

46

Policies for Sequential Decision Problems

This policy means that when the agent has observed

o dom(pa(Di )) , it will do δi(o)

CheckSmoke

Report δcs1 δcs2 δcs3 δcs4

Definition (Policy)A policy is a sequence of δ1 ,….., δn decision functions

δi : dom(pa(Di )) → dom(Di)

There are 22=4 possible decision functions δcs

for Check Smoke: - Decision function needs to specify a value for

each instantiation of parents


This policy means that when the agent has observed

o dom(pa(Di )) , it will do δi(o)

CheckSmoke

Report δcs1 δcs2 δcs3 δcs4

T T T F F

F T F T F



There are 22=4 possible decision functions δcs

for Check Smoke: - Decision function needs to specify a value for

each instantiation of parents

R=t,CS=t,SS=t

R=t,CS=t,SS=f

R=t,CS=f,SS=t

R=t,CS=f,SS=f

R=f,CS=t,SS=t

R=f,CS=t,SS=f

R=f,CS=f,SS=t

R=f,CS=f,SS=f

δcall1(R) T T T T T T T T

δcall2(R) T T T T T T T F

δcall3(R) T T T T T T F T

δcall4(R) T T T T T T F F

δcall5(R) T T T T T F T T

… … … … … … … … …

δcall256(R)

F F F F F F F F

R=t,CS=t,SS=t

R=t,CS=t,SS=f

R=t,CS=f,SS=t

R=t,CS=f,SS=f

R=f,CS=t,SS=t

R=f,CS=t,SS=f

R=f,CS=f,SS=t

R=f,CS=f,SS=f






… … … … … … … … …

δcall256(R)

F F F F F F F F

R=t,CS=t,SS=t

R=t,CS=t,SS=f

R=t,CS=f,SS=t

R=t,CS=f,SS=f

R=f,CS=t,SS=t

R=f,CS=t,SS=f

R=f,CS=f,SS=t

R=f,CS=f,SS=f






… … … … … … … … …

δcall256(R)

F F F F F F F F

R=t,CS=t,SS=t

R=t,CS=t,SS=f

R=t,CS=f,SS=t

R=t,CS=f,SS=f

R=f,CS=t,SS=t

R=f,CS=t,SS=f

R=f,CS=f,SS=t

R=f,CS=f,SS=f






… … … … … … … … …

δcall256(R)

F F F F F F F F




There are possible decision functions δcs for Call:

when the agent has observed o dom(pDi ) , it will do δi(o)

49

R=t,CS=t,SS=t

R=t,CS=t,SS=f

R=t,CS=f,SS=t

R=t,CS=f,SS=f

R=f,CS=t,SS=t

R=f,CS=t,SS=f

R=f,CS=f,SS=t

R=f,CS=f,SS=f






… … … … … … … … …

δcall256(R)

F F F F F F F F

R=t,CS=t,SS=t

R=t,CS=t,SS=f

R=t,CS=f,SS=t

R=t,CS=f,SS=f

R=f,CS=t,SS=t

R=f,CS=t,SS=f

R=f,CS=f,SS=t

R=f,CS=f,SS=f






… … … … … … … … …

δcall256(R)

F F F F F F F F

R=t,CS=t,SS=t

R=t,CS=t,SS=f

R=t,CS=f,SS=t

R=t,CS=f,SS=f

R=f,CS=t,SS=t

R=f,CS=t,SS=f

R=f,CS=f,SS=t

R=f,CS=f,SS=f






… … … … … … … … …

δcall256(R)

F F F F F F F F

R=t,CS=t,SS=t

R=t,CS=t,SS=f

R=t,CS=f,SS=t

R=t,CS=f,SS=f

R=f,CS=t,SS=t

R=f,CS=t,SS=f

R=f,CS=f,SS=t

R=f,CS=f,SS=f






… … … … … … … … …

δcall256(R)

F F F F F F F F




There are 28=256 possible decision functions δcs for Call:

when the agent has observed o dom(pDi ) , it will do δi(o)

50

Policies for Sequential Decision Problems • Policies for sequential decisions are the counterpart of Single

Decisions for One-Off decisions– They are just much more complex, because they tell the agent what to do

for any step of the decision sequence given any possible combination of available information

Which way

Wear Pads

d1d2d3d4

longlong shortshort


CheckSmoke

Report

δcs1 δcs2 δcs3 δcs4

T t t f f

F t f t f

Call

Report

CheckS

SeeS

δcall1 ……

δcallk δcalln

true TrueTrueTruefalsefalsefalsefalse

TrueTrueFalseFalseTrueTrueFalsefalse

TrueFalseTrueFalseTrueFalseTrueFalse

true false true falsetrue false false false

………….

true false true falsetrue false false false

true true true truetruetruetruetrue

Sample policy

Sample policy

Sample decision

How many policies are there?• If a decision D has k binary parents,

• there are 2k different assignments of values to the parents• If there are b possible value for a decision variable with k binary parents

• There are b2k different decision functions for that variable• because there are 2k possible instantiations for the parents and for every

instantiation of those parents, the decision function could pick any of b values

• If there are d decision variables, each with k binary parents and b possible values– There are different policies

52

How many policies are there?• If a decision D has k binary parents,

• there are 2k different assignments of values to the parents• If there are b possible value for a decision variable with k binary parents

• There are b2k different decision functions for that variable• because there are 2k possible instantiations for the parents and for every

instantiation of those parents, the decision function could pick any of b values

• If there are d decision variables, each with k binary parents and b possible values– There are (b2k)d different policies– because there are b2k possible decision functions for each decision, and a

policy is a combination of d such decision functions• We want to find the “most promising” policy given the inherent uncertainty,

i.e. the policy with the maximum expected utility– But of course we want to avoid enumerating all the policies: VE

53

Lecture Overview



54

Expected Value of a Policy• Like One-Off decisions, policies are selected based on their

expected utility

• Each possible world w has a probability P(w) and a utility U(w)

• The expected utility of policy π is

• The optimal policy is one with the maximum expected utility.

∑ w ╞ π =P(w)*U(w)

Possible worlds that satisfy the policy

When does a possible world satisfy a policy?• Possible world w satisfies policy π , written w ╞ π if the value of

each decision variable is the value selected by its decision function in the policy (when applied in w).

Report Check Smoke

true false

true false

Report CheckSmoke SeeSmoke

Call

true true true true true falsetrue false truetrue false falsefalse true truefalse true falsefalse false truefalse false false

falsefalse true falsetrue false false false

VARs

Fire Tampering AlarmLeaving ReportSmoke SeeSmoke CheckSmoke Call

truefalse truetruetrue true truetruetrue

Decision function δcs2 …

Decision function δcall20

…

Policy π

w2 ╞ π ?

w2

One last operation on factors: maxing out a variable

• Maxing out a variable is similar to marginalization– But instead of taking the sum of some values, we take the max

B A C f3(A,B,C)

t t t 0.03

t t f 0.07

f t t 0.54

f t f 0.36

t f t 0.06

t f f 0.14

f f t 0.48

f f f 0.32

A C f4(A,C)

t t 0.54

t f 0.36

f t ?

f f

maxB f3(A,B,C) = f4(A,C)

),,,(max,,max 21)(2 11 jXdomxjX XXxXfXX

57

One last operation on factors: maxing out a variable

• Maxing out a variable is similar to marginalization– But instead of taking the sum of some values, we take the max

B A C f3(A,B,C)

t t t 0.03

t t f 0.07

f t t 0.54

f t f 0.36

t f t 0.06

t f f 0.14

f f t 0.48

f f f 0.32

A C f4(A,C)

t t 0.54

t f 0.36

f t 0.48

f f 0.32

maxB f3(A,B,C) = f4(A,C)

),,,(max,,max 21)(2 11 jXdomxjX XXxXfXX

58

The no-forgetting property

• A decision network has the no-forgetting property if– Decision variables are totally ordered: D1, …, Dm

– If a decision Di comes before Dj ,then • Di is a parent of Dj

• any parent of Di is a parent of Dj

59

Idea for finding optimal policies with VE• Idea for finding optimal policies with variable elimination (VE):

Dynamic programming: precompute optimal future decisions– Consider the last decision D to be made

• Find optimal decision D=d for each instantiation of D’s parents – For each instantiation of D’s parents, this is just a single-stage decision problem

• Create a factor of these maximum values: max out D– I.e., for each instantiation of the parents, what is the best utility I can achieve by making

this last decision optimally?• Recurse to find optimal policy for reduced network (now one less decision)

60

Finding optimal policies with VE1. Create a factor for each CPT and a factor for the utility

2. While there are still decision variables– 2a: Sum out random variables that are not parents of a decision node.

• E.g – 2b: Max out last decision variable D in the total ordering

• Keep track of decision function

3. Sum out any remaining variable: this is the expected utility of the optimal policy.

This is Algorithm VE_DN in P&M, Section 9.3.3, p. 393

61

Finding optimal policies with VE1. Create a factor for each CPT and a factor for the utility

2. While there are still decision variables– 2a: Sum out random variables that are not parents of a decision node.

• E.g Tampering, Fire, Alarm, Smoke, Leaving– 2b: Max out last decision variable D in the total ordering

• Keep track of decision function

3. Sum out any remaining variable: this is the expected utility of the optimal policy.

This is Algorithm VE_DN in P&M, Section 9.3.3, p. 393

62

Eliminate the decision Variables: step3 details• Select a variable D that corresponds to the latest decision to be made

– this variable will appear in only one factor with (some of) its parents

• Eliminate D by maximizing. This returns:

– The optimal decision function for D, arg maxD f

– A new factor to use in VE, maxD f

• Repeat till there are no more decision nodes.

Report CheckSmoke Value

true truetrue falsefalse truefalse false

-5.0 -5.6 -23.7 -17.5

Example: Eliminate CheckSmoke

Report CheckSmoke

true false

Report

Value

truefalse

New factor

Decision Function

Computational complexity of VE for finding optimal policies

• We saw:For d decision variables (each with k binary parents and b possible actions), there are (b2k)d policies– All combinations of (b2k) decision functions per decision

• Variable elimination saves the final exponent:– Dynamic programming: consider each decision functions only once– Resulting complexity: O(d * b2k)– Much faster than enumerating policies (or search in policy space), but still

doubly exponential– Solution: approximation algorithms for finding optimal policies

64

Value of Information

• What would help the agent make a better Umbrella decision?

• The value of information of a random variable X for decision D is:• the utility of the network with an arc from X to D minus the

utility of the network without the arc.

Value of Information (cont.)• The value of information provides a bound on how much you should be prepared to pay for a sensor. How much is a perfect weather forecast worth?

• Original maximum expected utility:• Maximum expected utility when we know

Weather:• Better forecast is worth at most:

77

91

Value of Information• The value of information provides a bound on how much you

should be prepared to pay for a sensor. How much is a perfect fire sensor worth?

• Original maximum expected utility:• Maximum expected utility when we

know Fire:• Perfect fire sensor is worth:

-22.6

-2

• One-Off decisions– Compare and contrast stochastic single-stage (one-off) decisions vs. multistage (sequential)

decisions– Define a Utility Function on possible worlds– Define and compute optimal one-off decisions– Represent one-off decisions as single stage decision networks – Compute optimal decisions by Variable Elimination

• Sequential decision networks– Represent sequential decision problems as decision networks– Explain the non forgetting property

• Policies– Verify whether a possible world satisfies a policy– Define the expected utility of a policy – Compute the number of policies for a decision problem– Compute the optimal policy by Variable Elimination

• Define and compute value of information

68

Learning Goals For Decision Under Uncertainty

Computer Science CPSC 502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)

Documents

Transcript of Computer Science CPSC 502 Lecture 12 Decisions Under Uncertainty (Ch. 9, up to 9.3)