Knowledge Repn. & Reasoning Lec #24: Approximate Inference in DBNs

Post on 04-Feb-2016

31 views 0 download

Tags:

description

Knowledge Repn. & Reasoning Lec #24: Approximate Inference in DBNs. UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2004. (Some slides by X. Boyen & D. Koller, and by S. H. Lim; Some slides by Doucet, de Freitas, Murphy, Russell, and H. Zhou ). Dynamic Systems. - PowerPoint PPT Presentation

Transcript of Knowledge Repn. & Reasoning Lec #24: Approximate Inference in DBNs

Knowledge Repn. & ReasoningLec #24:

Approximate Inference in DBNsUIUC CS 498: Section EA

Professor: Eyal AmirFall Semester 2004

(Some slides by X. Boyen & D. Koller, and by S. H. Lim;

Some slides by Doucet, de Freitas, Murphy, Russell, and H. Zhou)

Dynamic Systems

• Filtering in stochastic, dynamic systems:– Monitoring freeway traffic (from an autonomous driver

or for traffic analysis)– Monitoring patient’s symptoms

• Models to deal with uncertainty and/or partial observability in dynamic systems:– Hidden Markov Models (HMMs), Kalman Filters etc– All are special cases of Dynamic Bayesian Networks

(DBNs)

Previously

• Exact DBN inference– Filtering– Smoothing– Projection– Explanation

DBN Myth

• Bayesian Network: a decomposed structure to represent the full joint distribution

• Does it imply easy decomposition for the belief state?

• No!

Tractable, approximate representation

• Exact inference in DBN is intractable

• Need approximation– Maintain an approximate belief state– E.g. assume Gaussian processes

• Today: – Factored belief state apx [Boyen & Koller ’98]– Particle filtering (if time permits)

Idea

• Use a decomposable representation for the belief state (pre-assume some independency)

Problem

• What about the approximation errors?– It might accumulate and grow unbounded…

Contraction property

• Main result:– If the process is mixing, then every state

transition results in a contraction of the distance between the two distributions by a constant factor

– Since approximation errors from previous steps decrease exponentially, the overall error remains bounded indefinitely

Basic framework• Definition 1:

– Prior belief state:

– Posterior belief state:

• Monitoring task:

],...,|[][ )1()0()()(

10

thh

tii

t

trrsPs

],,...,|[][ )()1()0()()(

10

th

thh

tii

t

ttrrrsPs

n

l hllt

hiit

it

n

ijii

tj

t

rsOs

rsOss

ssTss

1

)1(

)1()1(

1

)()1(

][][

][][][

][][][

Simple contraction

• Distance measure:– Relative entropy (KL-divergence) between the

actual and the approximate belief state

• Contraction due to O:

• Contraction due to T (can we do better?):

i i

iiED

][

][ln][][ln]||[

]ˆ||[]]]ˆ[||][[[ )()()()()(

tttr

tr DOODE

hht

]ˆ||[]]ˆ[||][[[ )()()()( tttt DTTD

Simple contraction (cont)

• Definition:– Minimal mixing rate:

• Theorem 3 (the single process contraction theorem):– For process Q, anterior distributions φ and ψ, ulterior distributions

φ’ and ψ’,

]]|[],|[min[min2121

1, ij

n

jijiiQ QQ

]||[)1(]||[ DD Q

Simple contraction (cont)

• Proof Intuition:

Compound processes

• Mixing rate could be very small for large processes• The trick is to assume some independence among

subprocesses and factor the DBN along these subprocesses

• Fully independent subprocesses:– Theorem 5:

• For L independent subprocesses T1, …, TL. Let γl be the mixing rate for Tl and let γ = minl γl. Let φ and ψ be distributions over S1

(t), …, SL

(t), and assume that ψ renders the Sl(t) marginally independent.

Then:

]||[)1(]||[ DD

Compound processes (cont)

• Conditionally independent subprocesses• Theorem 6 (the main theorem):– For L independent subprocesses T1, …, TL, assume each

process depends on at most r others, and each influences at most q others. Let γl be the mixing rate for Tl and let γ = minl γl. Let φ and ψ be distributions over S1

(t), …, SL(t), and assume

that ψ renders the Sl(t) marginally independent. Then:

q

rwhere

DD

*

* ]||[)1(]||[

Efficient, approximate monitoring

• If each approximation incurs an error bounded by ε, then– Total error

• =>error remains bounded• Conditioning on observations might introduce

momentary errors, but the expected error will contract

2)1()1(

Approximate DBN monitoring

• Algorithm (based on standard clique tree inference):

1. Construct a clique tree from the 2-TBN2. Initialize clique tree with conditional probabilities

from CPTs of the DBN3. For each time step:

a. Create a working copy of the tree Y. Create σ(t+1).b. For each subprocess l, incorporate the marginal σ(t)

[X(t)l] in the appropriate factor in Y.

c. Incorporate evidence r(t+1) in Y.d. Calibrate the potentials in Y.e. For each l, query Y for marginal over Xl

(t+1) and store it in σ(t+1).

Conclusion of Factored DBNs

• Accuracy-efficiency tradeoff:– Small partition =>

• Faster inference• Better contraction• Worse approximation

• Key to good approximation:– Discover weak/sparse interactions among

subprocesses and factor the DBN along these lines

– Domain knowledge helps

Agenda

• Factored inference in DBNs

• Sampling: Particle Filtering

A sneak peek at particle filtering

Introduction• Analytical methods

– Kalman filter: linear-Gaussian models– HMM: models with finite state space

• Stat. approx. methods for non-parametric distributions and large discrete DBN

• Diff. names:– Sequential Monte Carlo (Handschin and Mayne

1969, Akashi and Kumamoto 1975) – Particle filtering (Doucet et all 1997)– Survival of the fittest (Kanazawa, Koller and Russell

1995)– Condensation in computer vision (Isard and Blake

1996)

Outline

• Importance Sampling (IS) revisited– Sequential IS (SIS)– Particle Filtering = SIS + Resampling

• Dynamic Bayesian Networks– A Simple example: ABC network

• Inference in DBN:– Exact inference– Pure Particle Filtering– Rao-Blackwellised PF

• Demonstration in ABC network• Discussions

Importance Sampling Revisited

• Goal: evaluate the following functional

• Importance Sampling (batch mode):– Sample from – Assign

as weight of each sample– The posterior estimation of is:

hzhou
Importance function,whose support must include that of the state posterior. It must also have been normalized.

• How to make it sequential?

• Choose Importance function to be:

• We get the SIS filter

• Benefit of SIS– Observation yk don’t have be given in batch

Sequential Importance Sampling

Sequential Importance Sampling

Resampling

• Why need to resample– Degeneracy of SIS

• The variance of the importance weights (y0:k is r.v.) increases in each recursion step

– Optimal importance function

• Need to sample from and evaluate

• Resampling: eliminate small weights and concentrate on large weights

Resampling

• Measure of degeneracy: effective sample size

hzhou
N_{eff} should be large enough. Otherwise, the variance of the sample weights will be too large.Proved by [Kong, Liu and Wong 1994]

Resampling Step

Particle filtering = SIS + Resampling

Rao-Blackwellisation for SIS

• A method to reduce the variance of the final posterior estimation

• Useful when the state can be partitioned as in which can be analytically marginalized.

• Assuming can be evaluated analytically given , one can rewrite the posterior estimation as

Example: ABC network

Inference in DBN

n

hzhou
for the hidden variables.Obs variables can have Gaussian distribution

Exact inference in ABC network

Particle filtering

Rao-Blackwellised PF

Rao-Blackwellised PF (2)

Rao-Blackwellised PF (3)

Rao-Blackwellised PF (4)

Discussions

• Structure of the network:– A, C dependent on B– yt can be also separated into 3 indep. parts