Probabilistic Reasoning Over Time 2 - Queen's U

Post on 18-Dec-2021

5 views 0 download

Transcript of Probabilistic Reasoning Over Time 2 - Queen's U

Probabilistic Reasoning Over Time 2

CISC 453Amy VanBerlo

vanberlo_a@live.ca

Adapted from Ch 15 AIMA3e

Overview

15.3 Hidden Markov Models – HMM◦ +Power of Linear Algebra

15.4 Kalman Filters◦ +Gaussian Distributions revisited

15.5 Dynamic Bayesian Networks◦ The MEGAVARIABLE & Particle Filtering

Hidden Markov Models – HMM(15.3)

Recall:initial state model: P(X0)

transition model: P(Xi | Xi-1)

sensor model: P(Ei | Xi)

Discrete State Variables!

Hidden Markov Models – HMM(15.3)

transition model: P(Xi | Xi-1) => Transition Matrix

sensor model: P(Ei | Xi) => Diagonal Matrices

Hidden Markov Models – HMM(15.3)

Transition Matrix

state variable: Xt has a value denoted by integer 1-S, where S = # possible states

transition model: P(Xt | Xt-1) becomes SxS Matrix T:

Tij = P(Xt = j| Xt-1 = i)

Tij is the probability of a transition from state i to j.

Hidden Markov Models – HMM(15.3)

Transition MatrixEXAMPLE:

From Umbrella World:

Tij = P(Xt = j| Xt-1 = i)

T = P(Xt | Xt-1) =0.7 0.3

0.3 0.7

Diagonal Matrices (Evidence Variable / Sensor Model)

Et is known at time t => Value / et

Need P(et | Xt =i ), => diagonals of SxS Matrix, Ot

Hidden Markov Models – HMM(15.3)

Hidden Markov Models – HMM(15.3)

Diagonal MatricesEXAMPLE:

From Umbrella World:

U1 = true; U3 = false;

O1 = O3 =0.9 0

0 0.2

0.1 0

0 0.8

Hidden Markov Models – HMM(15.3)

Forward Eq => column vector f

f1:t+1 = αOt+1TTf1:t

Backward Eq => column vector b

bk+1:t = TOk+1bk+2:t

Hidden Markov Models – HMM(15.3)

Advantages

All computations become simple matrix-vector operations!

ComplexityForward-backward algorithm:

sequence length t is O(S2t)

Improved Smoothing Algorithm:…

Hidden Markov Models – HMM(15.3)

Example: LocalizationVacuum World, simplified:

Original Robot, obstacle sensors

NSEW

Action(moveN/S/E/orW)

Belief state: set of all possible locations robot could be in

Additions: Allow for sensor noise

P model robot’s motion

Domain: set empty squares {s1,…sn}

Neighbours(s) / N(s)

Hidden Markov Models – HMM(15.3)

Example: LocalizationVacuum World, simplified:

Transition Model for Move:P(Xt = j| Xt-1 = i) = Tij =(1/N(i), if j ε Ne(i) else 0)

Unknown start state, so assume uniform distribution over all squares, P(X0 = i) = 1/n

Et 16 possible values

Є each sensor’s error rate

Hidden Markov Models – HMM(15.3)

Example: LocalizationVacuum World, simplified:

Et 16 possible values

Є each sensor’s error rate, probability getting all four bits right (1- Є)4 / wrong Є4

Discrepancy dit

Probability that a robot in square I would receive a sensor reading et

P(Et = et | Xt =i) = Otii = (1- Є)4-ditЄ

dit

Hidden Markov Models – HMM(15.3)

Example: LocalizationVacuum World, simplified:

Kalman Filters (15.4)

Where HMM dealt with DISCRETE vars, Kalman Filters vars are CONTINOUS

Ex: tracking a bird flying through the forest

◦ Xt = X, Y , Z , Xveloc ,Yveloc ,Zveloc

Kalman Filters (15.4)

Updating Gaussian Distributions

Probability P(Xt | e1:t) (current distribution)

Prediction P(Xt+1 | e1:t) = ∫Xt P(Xt+1 | xt) P(xt | e1:t) dxt

Sensor Model P(et+1 | Xt+1)

Updated distribution

P(Xt+1 | e1:t+1) = αP(et+1 | Xt+1) P(Xt+1 | e1:t)

Kalman Filters (15.4)

Updating Gaussian Distributions

If P(Xt | e1:t) is Gaussian, then predictionP(Xt+1 | e1:t) is Gaussian.

If P(Xt+1 | e1:t) is Gaussian, then the updated distributionP(Xt+1 | e1:t+1) is Gaussian.

therefore:

P(Xt | e1:t) is multivariate Gaussian N( μt, ∑t) for all t

Gaussian is LINEAR=> Xt+1 linear function of Xt

Kalman Filters (15.4)

Multivariate Gaussian Implication:

Filtering with a linear Gaussian model produces a Gaussian state distribution for all time

Why so important?

Continuous variable systems grow without bound

over time

Being able to model with normal distributions allows for accurate and complexity reduced calculations

Kalman Filters (15.4)

Multivariate Gaussian Implication:

Kalman Filters (15.4)

Where it Breaks:

Cannot be applied if transition model nonlinear

ex: bird evading tree

Extended Kalman Filter models transitions as locally linear; fails if system is locally unsmooth.

Dynamic Bayesian Networks (15.5)

In general: each ‘slice’ of a DBN can have any number of:

state variables Xt

Sensor/evidence variables Et

Assume variables and their relationships preserved/replicated from time t to t+1

Dynamic Bayesian Networks (15.5)

vs Hidden Markov Models

Every HMM can be rep as a DBN with a single Xt

and Et

Every discrete variable DBN can be rep as an HMM… Combine all Xt into MEGAVARIABLE

◦ Values: all possible tuples of values of individual Xt

Dynamic Bayesian Networks (15.5)

vs Hidden Markov Models

◦ If interchangeable.. Where lies the difference?

“Sparseness”:Ex. Suppose DBN 20 boolean Xt, each 3 parents

DBN transition model: 20 x 23 = 160 probabilities

HMM transition matrix: 220 states, 240 probabilities (~trillion!!)

Dynamic Bayesian Networks (15.5)

vs Kalman Filters

Every KF can be rep as a DBN with continuous variables and linear Gaussian conditional distributions

Ex: tracking a bird flying through the forest

◦ Xt = X, Y , Z , Xveloc ,Yveloc ,Zveloc

Dynamic Bayesian Networks (15.5)

vs Kalman Filters

Every KF can be rep as a DBN but few DBNs are KFs;

DBNs can model arbitrary distributions

KF always model a single multivariate Gaussian distribution

Aspects of the real world (obstacles) introduce non-linearities, require combination discrete and continuous

Dynamic Bayesian Networks (15.5)

Constructing DBNs Must specify :

1. prior distribution over state variables P(X0)

2. transition model P(Xt+1 | Xt)

3. sensor model P(Et | Xt)

Must also specify connections between slices

RECALL: model assumed variables and their relationships preserved/replicated from time t to t+1

Simply specify first slice and copy!

Dynamic Bayesian Networks (15.5)

Exact Inference in DBNs have seen inference in BN before given sequence observations, can construct full

Bayesian network representation of DBN by replicating slices until network large enough

Unrolling

then apply inference algorithm (ch.14) variable elimination, clustering..etc

Dynamic Bayesian Networks (15.5)

Exact Inference in DBNsUnrolling

Problem:

inference cost for each update grows with t

Rollup Filtering add slice t+1, “sum out” slice t using variable

elimination

Largest factor is O(dn+1), update cost O(dn+2)

Dynamic Bayesian Networks (15.5)

Exact Inference in DBNsUnrolling

Use DBNs to represent very complex temporal process with many sparsely connected variables

CANNOT reason efficiently and exactly about those processes!

Dynamic Bayesian Networks (15.5)

Approximate Inference in DBNs

Likelihood Weighting adapted from (14.5)

sample and weight non-evidence nodes of network in topological order

to avoid growth problem seen in Exact Inference, can simply run all N samples together through DBN, one slice at a time

Dynamic Bayesian Networks (15.5)

Approximate Inference in DBNs

Likelihood Weighting STILL FLAWED!

LW samples pay no attention to evidence◦ Fraction “agreeing” falls exponentially with t

◦ # samples req grows exponentially with t

Idea: focus set of samples on high-probability regions state space…. Particle Filtering:

Dynamic Bayesian Networks (15.5)

Inference in DBNs- Particle Filtering

population N initial-state samples created from prior distribution P(X0)

Update Cycle repeated for each time step:1. given xt (current state value for the sample)

based on transition model P(Xt+1 | Xt)-propagate sample forward

2. sample weighted by ‘likelihood it assigns to new evidence’, P(et+1 | xt+1)

3. resample pop, new N unweighted samples

Dynamic Bayesian Networks (15.5)

Inference in DBNs- Particle Filtering

Assume consistent at time t:

◦ N(xt | e1:t) / N = P(xt | e1:t)

Propagate forward: pop o Xt+1 are◦ N(xt+1 | e1:t) = ∑xtP(xt+1 | xt) N(xt | e1:t)

Weight samples by their likelihood for et+1:◦ W(xt+1 | e1:t+1) = P(et+1 | xt+1) N(xt+1 | e1:t)

Resample to obtain populations proportional to W:

◦ N(xt+1 | e1:t+1)/N = ……

◦ = P(xt+1 | e1:t+1)

Dynamic Bayesian Networks (15.5)

Inference in DBNs- Particle Filtering

Performance:

Approximation error of PF remains bounded over time :D

At least empirically! – Theoretical analysis difficult.

Summary

Temporal models use state and sensor variables replicated over time

Hidden Markov Models have single discrete state variable

Kalman Filters allow n state variables, linear Gaussian, multivariate Gaussian distributions

Dynamic Bayesian Nets selectively interchangeable with HMMs and KFs◦ Particle Filtering good inference method/ filtering

algorithm for DBNs

Thanks!

Questions?