Artificial Intelligence

36
ARTIFICIAL INTELLIGENCE CH 17 MAKING COMPLEX DECISIONS

description

Artificial Intelligence. CH 17 Making complex decisions. Group (9). Team Members : Ahmed Helal Eid Mina Victor William Supervised by : Dr. Nevin M. Darwish. Agenda. Introduction Sequential Decision Problems Optimality in sequential decision problems Value Iteration - PowerPoint PPT Presentation

Transcript of Artificial Intelligence

Page 1: Artificial  Intelligence

ARTIFICIAL INTELLIGENCE

CH 17

MAKING COMPLEX DECISIONS

Page 2: Artificial  Intelligence

GROUP (9) Team Members :

Ahmed Helal Eid Mina Victor William

Supervised by : Dr. Nevin M. Darwish

Page 3: Artificial  Intelligence

AGENDA

Introduction Sequential Decision Problems Optimality in sequential decision problems Value Iteration The value iteration algorithm Policy Iteration

Page 4: Artificial  Intelligence

INTRODUCTION

PREVIOUSLY IN CH16

MAKING SIMPLE DECISION.

Concerned with episodic decision problems, in which the utility of each action's outcome was well known.

Episodic environment: the agent experience is divided into atomic episodes each one consists of the agent perceiving and then performing a single action.

Page 5: Artificial  Intelligence

INTRODUCTION

IN THIS CHAPTER

The computational issues involved in making decisions in stochastic environment.

Sequential decision problems, in which the agent's utility depends on a sequence of

decisions. Sequential decision problems, which include utilities,

uncertainty, and sensing, generalize the search and planning problems as special cases.

Page 6: Artificial  Intelligence

SEQUENTIAL DECISION PROBLEMS

Page 7: Artificial  Intelligence

SEQUENTIAL DECISION PROBLEMS

1

2

3

1 2 3 4

startWhat if the environment was deterministic ?

Unfortunately the environment not go along with this situation

-1

+1

Actions A(s) in every state are (Up , Down , Left , Right)

Page 8: Artificial  Intelligence

SEQUENTIAL DECISION PROBLEMS

1

2

3

1 2 3 4

start 0.1

0.8

0.1

Model for stochastic motion

0.8

0.10.1

[ Up, Up, Right, Right, Right ] 0.8^5 =0.32768 [ Right, Right, Up, Up, Right ] 0.1^4× 0.8 =0.00008

+1

-1

Page 9: Artificial  Intelligence

SEQUENTIAL DECISION PROBLEMS

Transition model T( S, a, S`)

Markovian transition Utility function Reward R(s)

Probability of reaching state S` if action a is done at state S

Probability of reaching state S` from S depend only on S

Depend on a sequence of state environment history

Agent receives reward in each state (+ve Or –ve)

1

2

3

1 2 3 4

+1

-1-0.04

-0.04-0.04

-0.04

-0.04

-0.04

-0.04

-0.04

Utility = ( - 0.04 × 10 )+1=0.6For 10 steps to the goal

Page 10: Artificial  Intelligence

MARKOV DECISION PROCESS (MDP)

We use MDPs to solve sequential decision problems.

We eventually want to find the best choice of action for each state.

Consists of: a set of actions A(s)

for actions in each state in state s transition model P(s' | s, a)

describing the probability of reaching s' using action a in s

transitions are Markovian - only depends on s not previous states

reward function R(s) the reward an agent receives for arriving in state s

Page 11: Artificial  Intelligence

SEQUENTIAL DECISION PROBLEMS

Policy (π) A solution must specify what the agent should do for

any state that the agent might reach.

(π(s)) The action recommended by the policy π for state S

Optimal policy (π*) Yield the highest expected utility

What is the solution to a problem look like ?

Page 12: Artificial  Intelligence

CONTINUE……

2

3

2

3

R (s) < -1.6284

-0.4278 < R (s) < -0.0850

1

1

+1

-1

+1

-1

Page 13: Artificial  Intelligence

CONTINUE……

2

3

2

3

-0.0221 < R (s) < 0

R (s) > 0

1

1

+1

-1

+1

-1

Page 14: Artificial  Intelligence

THE HORIZON

Finite horizon: Fixed time N after which nothing matter (the game is over) Optimal policy is Non-stationary

Is there a finite Or infinite horizon for decision making ?

Page 15: Artificial  Intelligence

EXAMPLE OF FINITE HORIZON

Optimal action in a given state could change over time

1

2

3

1 2 3 4

start

N= 3+1

-1

Page 16: Artificial  Intelligence

OPTIMALITY IN SEQUENTIAL DECISION PROBLEMS

Infinite horizon: No fixed deadline (time at state doesn’t

matter) Optimal policy is stationary

Is there a finite Or infinite horizon for decision making ?

Page 17: Artificial  Intelligence

EXAMPLE OF INFINITE HORIZON

Optimal action in a given state could not change over time

1

2

3

1 2 3 4

start

N= 100

+1

-1

Page 18: Artificial  Intelligence

OPTIMALITY IN SEQUENTIAL DECISION PROBLEMS

We are mainly going to use infinite horizon utility functions because there is no reason to behave differently in the same state.

Hence, the optimal action depends only on the current state, and the optimal policy is stationary.

Is there a finite Or infinite horizon for decision making ?

Page 19: Artificial  Intelligence

Optimality in sequential decision problems

Page 20: Artificial  Intelligence

OPTIMALITY IN SEQUENTIAL DECISION PROBLEMS

Additive reward:

Discount reward:

How to calculate utility of a state Sequence ?

...)()()(...]),,([ 210210 sRsRsRsssU h

...)()()(...]),,([ 22

10210 sRsRsRsssU h

Discount factor is between 0 & 1

Page 21: Artificial  Intelligence

OPTIMALITY IN SEQUENTIAL DECISION PROBLEMS

If the environment doesn’t contain a terminal state, Or if the agent never reach one, then all environment Histories will be infinitely long, and utilities

with Additive rewards will generally be infinite.

What if there isn't terminal State Or agent never reach one?

Page 22: Artificial  Intelligence

OPTIMALITY IN SEQUENTIAL DECISION PROBLEMS

Solution With Discount rewards : the utility of an infinite

sequence is finite, if rewards are bounded by Rmax and

γ<1

Uh([S0,S1,…..])= <=

=

What if there isn't terminal State Or agent never reach one?

Page 23: Artificial  Intelligence

OPTIMAL POLICIES FOR UTILITIES OF STATES

Expected utility for some policy π starting in state s

The optimal policy π* has the highest expected utility and will be given by

This sets π*(s) to the argument a of A(s) which gives the highest utility

Page 24: Artificial  Intelligence

OPTIMAL POLICIES FOR UTILITIES OF STATES

Policy is actually independent of start state: actions will differ but policy will never change

this comes from the nature of a Markovian decision problem with discounted utilities over infinite horizons

U(s) is also independent of start state and current state

Page 25: Artificial  Intelligence

OPTIMAL POLICIES FOR UTILITIES OF STATES

The utilities are higher for states closer to the +1 exit.Because fewer steps are required to reach the exit

Page 26: Artificial  Intelligence

ALGORITHMS FOR CALCULATING THE OPTIMAL POLICY

Value iteration

Policy iteration

Page 27: Artificial  Intelligence

VALUE ITERATION ALGORITHM

Hard to calculate because it's non-linear so use an iterating

algorithm.

Basic idea Start at an initial value for all states then update each state using their neighbours until

they hit equilibrium.

Page 28: Artificial  Intelligence

VALUE ITERATION ALGORITHM

Page 29: Artificial  Intelligence

VALUE ITERATION ALGORITHM

Page 30: Artificial  Intelligence

USING VALUE ITERATION ON THE EXAMPLE

Page 31: Artificial  Intelligence

VALUE ITERATION ALGORITHM

When to terminate??!

Bellman update is small. So the error compared with the true utility

function is small.

Why use c Rmax(1- γ) / γRecall: if γ < 1 and infinite-horizon then Uh converges to Rmax / (1 – γ) when summed over infinity

If ||Ui+1-Ui|| < ε(1- γ)/ γthen ||Ui+1-U|| < ε

Page 32: Artificial  Intelligence

POLICY ITERATION

Policy iteration algorithm alternates two steps:

policy evaluation :given policy πi

calculate Ui=U πi , the utility of each state if were to be

executed.

policy improvement: calculate a new policy Πi+1/

)(),,()( //* maxargsa

sUsasTs

Page 33: Artificial  Intelligence

POLICY ITERATION

Algorithmstart with policy π0

repeatPolicy evaluation: for each state calculate Ui

given by policy πi

simplified version of Bellman Update eqn – no need for max check if unchanged Policy improvement: for each state

if the max utility over each action gives a better result than π(s)

set π(s) to the new policy until unchanged

Page 34: Artificial  Intelligence

POLICY ITERATION ALGORITHM

Page 35: Artificial  Intelligence

Questions ???

Page 36: Artificial  Intelligence

Thanks