Knowledge Repn. & Reasoning Lec #24: Approximate Inference in DBNs

Knowledge Repn. & ReasoningLec #24:

Approximate Inference in DBNsUIUC CS 498: Section EA

Professor: Eyal AmirFall Semester 2004

(Some slides by X. Boyen & D. Koller, and by S. H. Lim;

Some slides by Doucet, de Freitas, Murphy, Russell, and H. Zhou)

Dynamic Systems

• Filtering in stochastic, dynamic systems:– Monitoring freeway traffic (from an autonomous driver

or for traffic analysis)– Monitoring patient’s symptoms

• Models to deal with uncertainty and/or partial observability in dynamic systems:– Hidden Markov Models (HMMs), Kalman Filters etc– All are special cases of Dynamic Bayesian Networks

(DBNs)

Previously

• Exact DBN inference– Filtering– Smoothing– Projection– Explanation

DBN Myth

• Bayesian Network: a decomposed structure to represent the full joint distribution

• Does it imply easy decomposition for the belief state?

• No!

Tractable, approximate representation

• Exact inference in DBN is intractable

• Need approximation– Maintain an approximate belief state– E.g. assume Gaussian processes

• Today: – Factored belief state apx [Boyen & Koller ’98]– Particle filtering (if time permits)

• Use a decomposable representation for the belief state (pre-assume some independency)

Problem

• What about the approximation errors?– It might accumulate and grow unbounded…

Contraction property

• Main result:– If the process is mixing, then every state

transition results in a contraction of the distance between the two distributions by a constant factor

– Since approximation errors from previous steps decrease exponentially, the overall error remains bounded indefinitely

Basic framework• Definition 1:

– Prior belief state:

– Posterior belief state:

• Monitoring task:

],...,|[][ )1()0()()(

trrsPs

],,...,|[][ )()1()0()()(

ttrrrsPs

l hllt

)1()1(

][][][

Simple contraction

• Distance measure:– Relative entropy (KL-divergence) between the

actual and the approximate belief state

• Contraction due to O:

• Contraction due to T (can we do better?):

][ln][][ln]||[

]ˆ||[]]]ˆ[||][[[ )()()()()(

tr DOODE

]ˆ||[]]ˆ[||][[[ )()()()( tttt DTTD

Simple contraction (cont)

• Definition:– Minimal mixing rate:

• Theorem 3 (the single process contraction theorem):– For process Q, anterior distributions φ and ψ, ulterior distributions

φ’ and ψ’,

]]|[],|[min[min2121

jijiiQ QQ

]||[)1(]||[ DD Q

Simple contraction (cont)

• Proof Intuition:

Compound processes

• Mixing rate could be very small for large processes• The trick is to assume some independence among

subprocesses and factor the DBN along these subprocesses

• Fully independent subprocesses:– Theorem 5:

• For L independent subprocesses T1, …, TL. Let γl be the mixing rate for Tl and let γ = minl γl. Let φ and ψ be distributions over S1

(t), …, SL

(t), and assume that ψ renders the Sl(t) marginally independent.

]||[)1(]||[ DD

Compound processes (cont)

• Conditionally independent subprocesses• Theorem 6 (the main theorem):– For L independent subprocesses T1, …, TL, assume each

process depends on at most r others, and each influences at most q others. Let γl be the mixing rate for Tl and let γ = minl γl. Let φ and ψ be distributions over S1

(t), …, SL(t), and assume

that ψ renders the Sl(t) marginally independent. Then:

rwhere

* ]||[)1(]||[

Efficient, approximate monitoring

• If each approximation incurs an error bounded by ε, then– Total error

• =>error remains bounded• Conditioning on observations might introduce

momentary errors, but the expected error will contract

2)1()1(

Approximate DBN monitoring

• Algorithm (based on standard clique tree inference):

1. Construct a clique tree from the 2-TBN2. Initialize clique tree with conditional probabilities

from CPTs of the DBN3. For each time step:

a. Create a working copy of the tree Y. Create σ(t+1).b. For each subprocess l, incorporate the marginal σ(t)

[X(t)l] in the appropriate factor in Y.

c. Incorporate evidence r(t+1) in Y.d. Calibrate the potentials in Y.e. For each l, query Y for marginal over Xl

(t+1) and store it in σ(t+1).

Conclusion of Factored DBNs

• Accuracy-efficiency tradeoff:– Small partition =>

• Faster inference• Better contraction• Worse approximation

• Key to good approximation:– Discover weak/sparse interactions among

subprocesses and factor the DBN along these lines

– Domain knowledge helps

Agenda

• Factored inference in DBNs

• Sampling: Particle Filtering

A sneak peek at particle filtering

Introduction• Analytical methods

– Kalman filter: linear-Gaussian models– HMM: models with finite state space

• Stat. approx. methods for non-parametric distributions and large discrete DBN

• Diff. names:– Sequential Monte Carlo (Handschin and Mayne

1969, Akashi and Kumamoto 1975) – Particle filtering (Doucet et all 1997)– Survival of the fittest (Kanazawa, Koller and Russell

1995)– Condensation in computer vision (Isard and Blake

Outline

• Importance Sampling (IS) revisited– Sequential IS (SIS)– Particle Filtering = SIS + Resampling

• Dynamic Bayesian Networks– A Simple example: ABC network

• Inference in DBN:– Exact inference– Pure Particle Filtering– Rao-Blackwellised PF

• Demonstration in ABC network• Discussions

Importance Sampling Revisited

• Goal: evaluate the following functional

• Importance Sampling (batch mode):– Sample from – Assign

as weight of each sample– The posterior estimation of is:

• How to make it sequential?

• Choose Importance function to be:

• We get the SIS filter

• Benefit of SIS– Observation yk don’t have be given in batch

Sequential Importance Sampling

Resampling

• Why need to resample– Degeneracy of SIS

• The variance of the importance weights (y0:k is r.v.) increases in each recursion step

– Optimal importance function

• Need to sample from and evaluate

• Resampling: eliminate small weights and concentrate on large weights

Resampling

• Measure of degeneracy: effective sample size

Resampling Step

Particle filtering = SIS + Resampling

Rao-Blackwellisation for SIS

• A method to reduce the variance of the final posterior estimation

• Useful when the state can be partitioned as in which can be analytically marginalized.

• Assuming can be evaluated analytically given , one can rewrite the posterior estimation as

Example: ABC network

Inference in DBN

Exact inference in ABC network

Particle filtering

Rao-Blackwellised PF

Rao-Blackwellised PF (2)

Discussions

• Structure of the network:– A, C dependent on B– yt can be also separated into 3 indep. parts

Knowledge Repn. & Reasoning Lec #24: Approximate Inference in DBNs

Documents

Transcript of Knowledge Repn. & Reasoning Lec #24: Approximate Inference in DBNs

Knowledge Repn. & Reasoning Lec #26: Filtering with Logic

Approximate Methods

Approximate Dynamic Programming via Linear Programmingpapers.nips.cc/paper/2129-approximate-dynamic-programming-via... · Approximate Dynamic Programming via Linear Programming ...

Approximate Calculations

BEAMS BY OTHER TOP @ 0'-0 BEAM BY OTHERS TOP @ 0'-0 L BLDG... · =DO NOT CUT END APPROXIMATE HOLE LOCATIONS APPROXIMATE HOLE LOCATIONS APPROXIMATE HOLE LOCATIONS APPROXIMATE HOLE

Representing hierarchical POMDPs as DBNs for multi-scale robot localization G. Thocharous, K. Murphy, L. Kaelbling Presented by: Hannaneh Hajishirzi.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND …blei/papers/ChenPolatkanSapiroBleiDunsonCarin2013.pdfworks [1], convolutional networks [2], deep belief networks (DBNs) [3], hierarchies

Approximate Tree Kernels

CSE 246: Computer Arithmetic Algorithms and Hardware Design Numbers: RNS, DBNS, Montgomory

Chapter 4 Approximate Localisation - ULisboalrm.isr.ist.utl.pt/lrm/ps/02_jsgm_localis_MSc_th4.pdf · 109 Chapter 4 Approximate Localisation The Approximate Localisation algorithms

Representative Frequent Approximate Subgraph Mining … · Representative Frequent Approximate Subgraph Mining in ... approximate graph matching. i. ... developed for ﬁnding all

Calculating approximate GCD of univariate polynomials with 'approximate' syzygies

Dynamic Bayesian Networks (DBNs)

CSE 246: Computer Arithmetic Algorithms and Hardware Design Numbers: RNS, DBNS, Montgomory Prof Chung-Kuan Cheng Lecture 3.

Approximate Computing with Approximate Circuits: …jhan8/publications/esweek17... · 2017-10-11 · Approximate Computing with Approximate Circuits: Methodologies and Applications

APPROXIMATE XPATH

Knowledge Repn. Reasoning Lec #11+13: Frame Systems and Description Logics UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2004.

Approximate Dynamic Programmingchercheurs.lille.inria.fr/~lazaric/Webpage/EC-RL_Course...Approximate Dynamic Programming (a.k.a. Batch Reinforcement Learning) Approximate Value Iteration

APPROXIMATE COST

Approximate Counting