1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall...

33
1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013

Transcript of 1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall...

Page 1: 1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.

1

Reasoning Under Uncertainty Over Time

CS 486/686: Introduction to Artificial IntelligenceFall 2013

Page 2: 1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.

2

Outline

• Reasoning under uncertainty over time

- Hidden Markov Models

- Dynamic Bayes Nets

Page 3: 1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.

3

Introduction

• So far we have assumed

- The world does not change

- Static probability distribution

• But the world does evolve over time

- How can we use probabilistic inference for weather predictions, stock market predictions, patient monitoring, robot localization,...

Page 4: 1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.

World dynamics

4

Page 5: 1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.

World dynamics

5

Page 6: 1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.

6

Dynamic Inference

• To reason over time we need to consider the following:

- Allow the world to evolve

- Set of states (all possible worlds)

- Set of time-slices (snapshots of the world)

- Different probability distributions over states at different time-slices

- Dynamic encoding of how distributions change over time

Page 7: 1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.

7

Stochastic Process

• Set of states: S

• Stochastic dynamics: P(st|st-1,...,s0)

• Can be viewed as a Bayes Net with one random variable per time-slice

s0 s1 s2 s3 s4

P(s0) P(s1|s0) P(s2|s1,s0) P(s3|s2,s1,s0) P(s4|s3,...,s0)

Page 8: 1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.

8

Stochastic Process

• Problems:

- Infinitely many variables

- Infinitely large CPTs

• Solutions:

- Stationary process: Dynamics do not change over time

- Markov assumption: Current state depends only on a finite history of past states

Page 9: 1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.

9

k-Order Markov Process

• Assumption: last k states are sufficient

• First-order Markov process (Most commonly used)

- P(st|st-1,...,s0)=P(st|st-1)

• Second-order Markov process

- P(st|st-1,...,s0)=P(st|st-1,st-2)

s0 s1 s2 s3 s4

s0 s1 s2 s3 s4

Page 10: 1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.

10

k-Order Markov Process

• Advantages

- Can specify the entire process using finitely many time slices

• Example: Two slices sufficient for a first-order Markov process

- Graph:

- Dynamics: P(st|st-1)

- Prior: P(s0)

st-1 st

Page 11: 1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.

11

Example: Robot Localization

• Example of a first-order Markov process

• Robot’s belief about location

• Only last location is sufficient (no need for history)

Problem: uncertainty increases over time

Thrun et al

Page 12: 1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.

12

Hidden Markov Models

• In the previous example, the robot could use sensors to reduce location uncertainty

• In general:

- States not directly observable (uncertainty captured by a distribution)

- Uncertain dynamics increase state uncertainty

- Observations: made via sensors can reduce state uncertainty

• Solution: Hidden Markov Model

Page 13: 1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.

13

First Order Hidden Markov Model (HMM)

• Set of states: S

• Set of observations: O

• Two models of uncertainties:

- Transition model: P(st|st-1)

- Observation model: P(ot|st)

• Prior: P(s0)

s0 s1 s2 s3 s4

o4o3o2o1Observed random variables:

Hidden variables:

Page 14: 1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.

Markov chain vs. HMM

• Markov chain

- Observable random variables

14

s0 s1 s2 s3 s4

s0 s1 s2 s3 s4

o4o3o2o1Observed random variables:

Hidden variables:

• Hidden Markov Model

Page 15: 1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.

15

Example: Robot Localization

• Hidden Markov Model

- S: (x,y) coordinates of the robot on the map

- O: distances to surrounding obstacles (measured by laser range fingers or sonar)

- P(st|st-1): movement of the robot with uncertainty

- P(ot|st): uncertainty in the measurements provided by the sensors

• Localization corresponds to the query: P(st|ot,...,o1)

All observations up to now

Page 16: 1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.

16

Inference

•  

Page 17: 1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.

17

Monitoring

• We are interested in the distribution over current states given observations: P(st|ot,...,o1)

- Examples: patient monitoring, robot localization

• Forward algorithm: corresponds to variable elimination

- Factors: P(s0), P(si|si-1), P(oi|si) 1≤i≤t

- Restrict o1,...,ot to observations made

- Sum out s0,....,st-1

- ∑s0...st-1P(s0)∏1≤i≤tP(si|si-1)P(oi|si)

Page 18: 1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.

18

Prediction

• We are interested in distributions over future states given observations: P(st+k|ot,...,o1)

- Examples: weather prediction, stock market prediction

• Forward algorithm: corresponds to variable elimination

- Factors: P(s0), P(si|si-1), P(oi|si) 1≤i≤t+k

- Restrict o1,...,ot to observations made

- Sum out s0,....,st+k-1,ot+1,..,ot+k

- ∑s0...st-1,ot+1,...,ot+kP(s0)∏1≤i≤t+kP(si|si-1)P(oi|si)

Page 19: 1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.

19

Hindsight

• Interested in the distribution over a past state given observations

- Example: crime scene investigation

• Forward-backward algorithm: corresponds to variable elimination

- Factors: P(s0), P(si|si-1), P(oi|si) 1≤i≤t

- Restrict o1,...,ot to observations made

- Sum out s0,....,sk-1,sk+1,...,st

- ∑s0...sk-1,sk+1,...st,P(s0)∏1≤i≤tP(si|si-1)P(oi|si)

Page 20: 1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.

Example:

20

s0 s1 s2 s3 s4

o4o3o2o1

 

 

Exercise: prove the aboveHint: use Bayes rule

Backward Forward

Page 21: 1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.

Example:

21

s0 s1 s2 s3 s4

o4o3o2o1

 

 

Exercise: prove the aboveHint: use Bayes rule

B A

P(A|B) P(B)

Page 22: 1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.

Example:

•  •  

22

s0 s1 s2 s3 s4

o4o3o2o1

 

Backward algorithm: Forward algorithm:marginalization

conditional independence

Page 23: 1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.

Example:

•   •  

23

s0 s1 s2 s3 s4

o4o3o2o1

  Backward algorithm: Forward algorithm:

marginalization

conditional independence

1

Bayes ruleC.I.

marginalization

Page 24: 1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.

24

Most Likely Explanation

• We are interested in the most likely sequence of states given the observations: argmaxs0,...st P(s0,...,st|ot,...,o1)

- Example: speech recognition

• Viterbi algorithm: Corresponds to a variant of variable elimination

• Similar to forward algorithm: except selecting the best

- Factors: P(s0), P(si|si-1), P(oi|si) 1≤i≤t

- Restrict o1,...,ot to observations made

- Max out s0,....,st-1

- maxs0...st-1P(s0)∏1≤i≤tP(si|si-1)P(oi|si)

Page 25: 1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.

25

Complexity of Temporal Inference

• Hidden Markov Models are Bayes Nets with a polytree structure

• Variable elimination is

- Linear with respect to number of time slices

- Linear with respect to largest CPT (P(st|st-1) or P(ot|st))

Page 26: 1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.

26

Dynamic Bayes Nets

• What if the number of states or observations are exponential?

• Dynamic Bayes Nets

- Idea: Encode states and observations with several random variables

- Advantage: Exploit conditional independence and save time and space

- Note: HMMs are just DBNs with one state variable and one observation variable

Page 27: 1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.

27

Example: Robot Localization

• States: (x,y) coordinates and heading θ

• Observations: laser and sonar readings, la and so

Page 28: 1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.

DBN with conditional independence

•  

28

Page 29: 1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.

29

DBN Complexity

• Conditional independence allows us to represent the transition and observation models very compactly!

• Time and space complexity of inference: conditional independence rarely helps

- Inference tends to be exponential in the number of state variables

- Intuition: All state variables eventually get correlated

- No better than with HMMs

Page 30: 1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.

30

Non-Stationary Processes

• What if the process is not stationary?

- Solution: Add new state components until dynamics are stationary

- Example: Robot navigation based on (x,y,θ) is nonstationary when velocity varies

- Solution: Add velocity to state description (x,y,v,θ)

- If velocity varies, then add acceleration,...

Page 31: 1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.

31

Non-Markovian Processes

• What if the process is not Markovian?

- Solution: Add new state components until the dynamics are Markovian

- Example: Robot navigation based on (x,y,θ) is non-Markovian when influenced by battery level

- Solution: Add battery level to state description (x,y,θ,b)

Page 32: 1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.

32

Markovian Stationary Processes

• Problem: Adding components to the state description to force a process to be Markovian and stationary may significantly increase computational complexity

• Solution: Try to find the smallest description that is self-sufficient (i.e. Markovian and stationary)

Page 33: 1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.

33

Summary

• Stochastic Process

- Stationary : no change over time

- Markov assumption : dependent on a subset of history not all

• Hidden Markov Process

- Prediction

- Monitoring

- Hindsight

- Most likely explanation

• Dynamic Bayes Nets

• What to do if the stationary or Markov assumptions do not hold