Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf ·...

52
Time series models, Importance sampling and Sequential Monte Carlo A. Taylan Cemgil Signal Processing and Communications Lab. 08 March 2007 Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007

Transcript of Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf ·...

Page 1: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

Time series models, Importance sampling andSequential Monte Carlo

A. Taylan Cemgil

Signal Processing and Communications Lab.

08 March 2007

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007

Page 2: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

Outline

• Time Series Models and Inference

• Importance Sampling

• Resampling

• Putting it all together, Sequential Monte Carlo

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 1

Page 3: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

Time series models and Inference, Terminology

In signal processing, applied physics, machine learning many phenomena aremodelled by dynamical models

x0 x1 . . . xk−1 xk . . . xK

y1 . . . yk−1 yk . . . yK

xk ∼ p(xk|xk−1) Transition Model

yk ∼ p(yk|xk) Observation Model

• x are the latent states

• y are the observations

• In a full Bayesian setting, x includes unknown model parameters

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 2

Page 4: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

Online Inference, Terminology

• Filtering: p(xk|y1:k)

– Distribution of current state given all past information– Realtime/Online/Sequential Processing

x0 x1 . . . xk−1 xk . . . xK

y1 . . . yk−1 yk . . . yK

• Potentially confusing misnomer:

– More general than “digital filtering” (convolution) in DSP – butalgoritmically related for some models (KFM)

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 3

Page 5: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

Online Inference, Terminology

• Prediction p(yk:K, xk:K|y1:k−1)

– evaluation of possible future outcomes; like filtering withoutobservations

x0 x1 . . . xk−1 xk . . . xK

y1 . . . yk−1 yk . . . yK

• Tracking, Restoration

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 4

Page 6: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

Offline Inference, Terminology• Smoothing p(x0:K|y1:K),

Most likely trajectory – Viterbi path arg maxx0:Kp(x0:K|y1:K)

better estimate of past states, essential for learning

x0 x1 . . . xk−1 xk . . . xK

y1 . . . yk−1 yk . . . yK

• Interpolation p(yk, xk|y1:k−1, yk+1:K)fill in lost observations given past and future

x0 x1 . . . xk−1 xk . . . xK

y1 . . . yk−1 yk . . . yK

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 5

Page 7: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

Deterministic Linear Dynamical Systems

• The latent variables sk and observations yk are continuous

• The transition and observations models are linear

• Examples

– A deterministic dynamical system with two state variables– Particle moving on the real line,

sk =

(phaseperiod

)

k

=

(1 1

0 1

)

sk−1 = Ask−1

yk = phasek =(

1 0)sk = Csk

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 6

Page 8: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

Kalman Filter Models, Stochastic Dynamical Systems

• We allow random (unknown) accelerations and observation error

sk =

(1 1

0 1

)

sk−1 + ǫk

= Ask−1 + ǫk

yk =(

1 0)sk + νk

= Csk + νk

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 7

Page 9: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

Tracking

s0 s1 . . . sk−1 sk . . . sK

y1 . . . yk−1 yk . . . yK

• In generative model notation

sk ∼ N (sk;Ask−1, Q)

yk ∼ N (yk;Csk, R)

• Tracking = estimating the latent state of the system = Kalman filtering

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 8

Page 10: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

α1|0 = p(x1)

Per

iod

0 1 2 3 4 50

0.5

1

1.5

p(y k|y

1:k−

1)

Phase

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 9

Page 11: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

α1|1 = p(y1|x1)p(x1)

0 1 2 3 4 50

0.5

1

1.5

p(y k|y

1:k−

1)

Phase

Per

iod

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 10

Page 12: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

α2|1 =∫

dx1p(x2|x1)p(y1|x1)p(x1) ∝ p(x2|y1)

Per

iod

0 1 2 3 4 50

0.5

1

1.5

p(y k|y

1:k−

1)

Phase

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 11

Page 13: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

α2|2 = p(y2|x2)p(x2|y1)

0 1 2 3 4 50

0.5

1

1.5

p(y k|y

1:k−

1)

Phase

Per

iod

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 12

Page 14: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

α5|5 ∝ p(x5|y1:5)

0 1 2 3 4 50

0.5

1

1.5

p(y k|y

1:k−

1)

Phase

Per

iod

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 13

Page 15: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

Nonlinear/Non-Gaussian Dynamical Systems

xk ∼ p(xk|xk−1) Transition Model

yk ∼ p(yk|xk) Observation Model

• What happens when the transition and/or observation model are non-Gaussian

• Apart from a handful of happy cases, the filtering density is not available inclosed form or costs a lot of memory to represent exactly

⇒ Need efficient and flexible numeric integration techniques

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 14

Page 16: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

Nonlinear Dynamical System Example

• Noisy Sinusoidal with frequency modulation

∆k ∼ N (∆k;∆k−1, Q)

φk = φk−1 + ∆k

yk ∼ N (yk; sin(φk), R)

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 15

Page 17: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

Example:

−0.5

0

0.5

1

1.5Phase difference ∆

t/sec

f/Hz

Spectrogram

0 10 20 30 40 50 60 700

1000

2000

−2

−1

0

1

2

Signal yt

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16

Page 18: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

Dynamical Sytems with switching

• Complicated processes can be modeled by using simple processes withoccasional regime switches

– Piecewise constant

0 10 20 30 40 50 60 70 80 90 100−5

0

5

10

15

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 17

Page 19: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

Segmentation and Changepoint detection– Piecewise linear

0 20 40 60 80 100 120 140 160 180 200−10

−5

0

5

10

15

• Used for tracking, segmentation, changepoint detection ...

– What is the true state of the process given noisy data ?– Where are the changepoints ?– How many changepoints ?

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 18

Page 20: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

Example: Conditionally Gaussian Changepoint Model

rk ∼ p(rk|rk−1) Changepoint flags ∈ {new, reg}

θk ∼ [rk = reg] f(θk|θk−1)︸ ︷︷ ︸

Transition

+[rk = new] π(θk)︸ ︷︷ ︸

Reinitialization

Latent State

yk ∼ p(yk|θk) Observations

r1 r2 r3 r4 r5

θ0 θ1 θ2 θ3 θ4 θ5

y1 y2 y3 y4 y5

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 19

Page 21: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

Example: Piecewise constant signal

0 10 20 30 40 50 60 70 80 90 100−5

0

5

10

15

θ0 ∼ N (µ, P )

rk|rk−1 ∼ p(rk|rk−1)

θk|θk−1, rk ∼ [rk = 0]δ(θk − θk−1)︸ ︷︷ ︸

reg

+ [rk = 1]N (m, V )︸ ︷︷ ︸

new

yk|θk ∼ N (θk, R)

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 20

Page 22: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

Switching State space model

200 400 600 800 10000

2

4

6

8

10

12y

0 200 400 600 800 1000−2

0

2

4

6

8θ(2)

rk ∼ p(rk|rk−1) Regime label

θk ∼ N (θk;Arkθk−1, Qrk

)

yk ∼ N (yk; Cθk, R) Observations

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 21

Page 23: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

Exact Inference in switching state space models

• In general, exact inference is intractable (NP hard)

– Conditional Gaussians are not closed under marginalization

⇒ Unlike HMM’s or KFM’s, summing over rk does not simplify the filteringdensity

⇒ Number of Gaussian kernels to represent exact filtering densityp(rk, θk|y1:k) increases exponentially

−7.90366.6343

0.76292

−10.3422

−10.1982−2.393

−2.7957

−0.4593

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 22

Page 24: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

Sequential Monte Carlo - Particle Filtering

• We try to approximate the so-called filtering density with a set ofpoints/Gaussians ≡ particles

• Algorithms are intuitively similar to randomised search algorithms but arebest understood in terms of sequential importance sampling and resamplingtechniques

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 23

Page 25: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

Importance Sampling

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 24

Page 26: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

Importance Sampling (IS)

Consider a probability distribution with (possibly unknown) normalisation constant

p(x) =1

Zφ(x) Z =

dxφ(x).

IS: Estimate expectations (or features) of p(x) by a weighted sample

〈f(x)〉p(x) =

dxf(x)p(x)

〈f(x)〉p(x) ≈N∑

i=1

w̃(i)f(x(i))

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 25

Page 27: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

Importance Sampling (cont.)

• Change of measure with weight function W (x) ≡ φ(x)/q(x)

〈f(x)〉p(x) =1

Z

dxf(x)φ(x)

q(x)q(x) =

1

Z

f(x)φ(x)

q(x)

q(x)

≡1

Z〈f(x)W (x)〉q(x)

• If Z is unknown, as is often the case in Bayesian inference

Z =

dxφ(x) =

dxφ(x)

q(x)q(x) = 〈W (x)〉q(x)

〈f(x)〉p(x) =〈f(x)W (x)〉q(x)

〈W (x)〉q(x)

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 26

Page 28: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

Importance Sampling (cont.)

• Draw i = 1, . . . N independent samples from q

x(i)∼ q(x)

• We calculate the importance weights

W(i)

= W (x(i)

) = φ(x(i)

)/q(x(i)

)

• Approximate the normalizing constant

Z = 〈W (x)〉q(x) ≈

NXi=1

W(i)

• Desired expectation is approximated by

〈f(x)〉p(x) =〈f(x)W (x)〉q(x)

〈W (x)〉q(x)

PNi=1 W (i)f(x(i))PN

i=1 W (i)≡

NXi=1

w̃(i)f(x(i))

Here w̃(i) = W (i)/

PNj=1 W (j) are normalized importance weights.

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 27

Page 29: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

Importance Sampling (cont.)

−10 −5 0 5 10 15 20 250

0.1

0.2

−10 −5 0 5 10 15 20 250

10

20

30

−10 −5 0 5 10 15 20 250

0.1

0.2

φ(x)q(x)

W(x)

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 28

Page 30: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

Resampling

• Importance sampling computes an approximation with weighted delta functions

p(x) ≈∑

i

W̃ (i)δ(x− x(i))

• In this representation, most of W̃ (i) will be very close to zero and the representation may bedominated by few large weights.

• Resampling samples a set of new “particles”

x(j)new ∼

Xi

W̃ (i)δ(x− x(i))

p(x) ≈1

NX

j

δ(x− x(j)new)

• Since we sample from a degenerate distribution, particle locations stay unchanged. We merelydublicate (, triplicate, ...) or discard particles according to their weight.

• This process is also named “selection”, “survival of the fittest”, e.t.c., in various fields (Geneticalgorithms, AI..).

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 29

Page 31: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

Resampling

−10 −5 0 5 10 15 20 250

0.1

0.2

−10 −5 0 5 10 15 20 250

10

20

30

−10 −5 0 5 10 15 20 250

1

2

−10 −5 0 5 10 15 20 250

0.1

0.2

φ(x)q(x)

W(x)

xnew

x(j)new ∼

i W̃ (i)δ(x− x(i))

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 30

Page 32: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

Examples of Proposal Distributions

x y p(x|y) ∝ p(y|x)p(x)

• Prior as the proposal. q(x) = p(x)

W (x) =p(y|x)p(x)

p(x)= p(y|x)

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 31

Page 33: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

Examples of Proposal Distributions

x y p(x|y) ∝ p(y|x)p(x)

• Likelihood as the proposal. q(x) = p(y|x)/∫

dxp(y|x) = p(y|x)/c(y)

W (x) =p(y|x)p(x)

p(y|x)/c(y)= p(x)c(y) ∝ p(x)

• Interesting when sensors are very accurate and dim(y)≫ dim(x).

Since there are many proposals, is there a “best” proposal distribution?

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 32

Page 34: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

Optimal Proposal Distribution

x y p(x|y) ∝ p(y|x)p(x)

Task: Estimate 〈f(x)〉p(x|y)

• IS constructs the estimator I(f) = 〈f(x)W (x)〉q(x)

• Minimize the variance of the estimator⟨

(f(x)W (x)− 〈f(x)W (x)〉)2⟩

q(x)=

⟨f2(x)W 2(x)

q(x)− 〈f(x)W (x)〉2q(x)(1)

=⟨f2(x)W 2(x)

q(x)− 〈f(x)〉2p(x) (2)

=⟨f2(x)W 2(x)

q(x)− I2(f) (3)

• Minimize the first term since only it depends upon q

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 33

Page 35: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

Optimal Proposal Distribution

• (By Jensen’s inequality) The first term is lower bounded:

⟨f2(x)W 2(x)

q(x)≥ 〈|f(x)|W (x)〉2q(x) =

(∫

|f(x)| p(x|y)dx

)2

• We well look for a distribution q∗ that attains this lower bound. Take

q∗(x) =|f(x)|p(x|y)

∫|f(x′)|p(x′|y)dx′

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 34

Page 36: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

Optimal Proposal Distribution (cont.)

• The weight function for this particular proposal q∗ is

W∗(x) = p(x|y)/q∗(x) =

∫|f(x′)|p(x′|y)dx′

|f(x)|

• We show that q∗ attains its lower bound

⟨f2(x)W 2

∗ (x)⟩

q∗(x)=

f2(x)

(∫|f(x′)|p(x′|y)dx′

)2

|f(x)|2

q∗(x)

=

(∫

|f(x′)|p(x′|y)dx′)2

= 〈|f(x)|〉2p(x|y)

= 〈|f(x)|W∗(x)〉2q∗(x)

• ⇒ There are distributions q∗ that are even “better” than the exact posterior!

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 35

Page 37: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

A link to alpha divergences

The α-divergence between two distributions is defined as

Dα(p||q) ≡1

β(1− β)

(

1−

dxp(x)βq(x)1−β

)

where β = (1 + α)/2 and p and q are two probability distributions

• limβ→0 Dα(p||q) = KL(q||p)

• limβ→1 Dα(p||q) = KL(p||q)

• β = 2, (α = 3)

D3(p||q) ≡1

2

dxp(x)2q(x)−1 −1

2=

1

2

⟨W (x)2

q(x)−

1

2

Best q (in a constrained family) is typically a heavy-tailed approximation to p

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 36

Page 38: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

Examples of Proposal Distributions

x1 x2

p(x|y) ∝ p(y1|x1)p(x1)p(y2|x2)p(x2|x1)

y1 y2

Task: Obtain samples from the posterior p(x1:2|y1:2) = 1Zy

φ(x1:2)

• Prior as the proposal. q(x1:2) = p(x1)p(x2|x1)

W (x1:2) =φ(x1:2)

q(x1:2)= p(y1|x1)p(y2|x2)

• We sample from the prior as follows:

x(i)1 ∼ p(x1) x

(i)2 ∼ p(x2|x1 = x

(i)1 ) W (x(i)) = p(y1|x

(i)1 )p(y2|x

(i)2 )

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 37

Page 39: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

Examples of Proposal Distributions

φ(x1:2) = p(y1|x1)p(x1)p(y2|x2)p(x2|x1)

• State prediction as the proposal. q(x1:2) = p(x1|y1)p(x2|x1)

W (x1:2) =p(y1|x1)p(x1)p(y2|x2)p(x2|x1)

p(x1|y1)p(x2|x1)= p(y1)p(y2|x2)

• We sample from the proposal and compute the weight

x(i)1 ∼ p(x1|y1) x

(i)2 ∼ p(x2|x1 = x

(i)1 ) W (x(i)) = p(y1)p(y2|x

(i)2 )

• Note that this weight does not depend on x1

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 38

Page 40: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

Examples of Proposal Distributions

φ(x1:2) = p(y1|x1)p(x1)p(y2|x2)p(x2|x1)

• Filtering distribution as the proposal. q(x1:2) = p(x1|y1)p(x2|x1, y2)

W (x1:2) =p(y1|x1)p(x1)p(y2|x2)p(x2|x1)

p(x1|y1)p(x2|x1, y2)= p(y1)p(y2|x1)

• We sample from the proposal and compute the weight

x(i)1 ∼ p(x1|y1) x

(i)2 ∼ p(x2|x1 = x

(i)1 , y2) W (x(i)) = p(y1)p(y2|x

(i)1 )

• Note that this weight does not depend on x2

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 39

Page 41: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

Examples of Proposal Distributions

φ(x1:2) = p(y1|x1)p(x1)p(y2|x2)p(x2|x1)

• Exact posterior as the proposal. q(x1:2) = p(x1|y1, y2)p(x2|x1, y2)

W (x1:2) =p(y1|x1)p(x1)p(y2|x2)p(x2|x1)

p(x1|y1)p(x2|x1, y2)= p(y1)p(y2|y1)

• Note that this weight is constant, i.e.⟨W (x1:2)

2⟩− 〈W (x1:2)〉

2 = 0

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 40

Page 42: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

Variance reduction

q(x) W (x) = φ(x)/q(x)

p(x1)p(x2|x1) p(y1|x1)p(y2|x2)

p(x1|y1)p(x2|x1) p(y1)p(y2|x2)

p(x1|y1)p(x2|x1, y2) p(y1)p(y2|x1)

p(x1|y1, y2)p(x2|x1, y2) p(y1)p(y2|y1)

Accurate proposals

• gradually decrease the variance

• but take more time to compute

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 41

Page 43: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

Sequential Importance Sampling, Particle Filtering

Apply importance sampling to the SSM to obtain some samples from the posteriorp(x0:K|y1:K).

p(x0:K|y1:K) =1

p(y1:K)p(y1:K|x0:K)p(x0:K) ≡

1

Zyφ(x0:K) (4)

Key idea: sequential construction of the proposal distribution q, possibly using theavailable observations y1:k, i.e.

q(x0:K|y1:K) = q(x0)K∏

k=1

q(xk|x1:k−1y1:k)

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 42

Page 44: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

Sequential Importance Sampling

Due to the sequential nature of the model and the proposal, the importance weightfunction W (x0:k) ≡Wk admits recursive computation

Wk =φ(x0:k)

q(x0:k|y1:k)=

p(yk|xk)p(xk|xk−1)

q(xk|x0:k−1y1:k)

φ(x0:k−1)

q(x0:k−1|y1:k−1)(5)

=p(yk|xk)p(xk|xk−1)

q(xk|x0:k−1, y1:k)Wk−1 ≡ uk|0:k−1Wk−1 (6)

Suppose we had an approximation to the posterior (in the sense 〈f(x)〉φ ≈

P

i W(i)k−1f(x

(i)0:k−1))

φ(x0:k−1) ≈∑

i

W(i)k−1δ(x0:k−1 − x

(i)0:k−1)

x(i)k ∼ q(xk|x

(i)0:k−1, y1:k) Extend trajectory

W(i)k = u

(i)k|0:k−1Wk−1 Update weight

φ(x0:k) ≈∑

i

W(i)k δ(x0:k − x

(i)0:k)

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 43

Page 45: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

Example

• Prior as the proposal density

q(xk|x0:k−1, y1:k) = p(xk|xk−1)

• The weight is given by

x(i)k ∼ p(xk|x

(i)k−1) Extend trajectory

W(i)k = u

(i)k|0:k−1Wk−1 Update weight

=p(yk|x

(i)k )p(x

(i)k |x

(i)k−1)

p(x(i)k |x

(i)k−1)

W(i)k−1 = p(yk|x

(i)k )W

(i)k−1

• However, this schema will not work, since we blindly sample from the prior. But...

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 44

Page 46: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

Example (cont.)• Perhaps surprisingly, interleaving importance sampling steps with (occasional)

resampling steps makes the approach work quite well !!

x(i)k ∼ p(xk|x

(i)k−1) Extend trajectory

W(i)k = p(yk|x

(i)k )W

(i)k−1 Update weight

W̃(i)k = W

(i)k /Z̃k Normalize (Z̃k ≡

i′W

(i′)k )

x(j)0:k,new ∼

N∑

i=1

W̃ (i)δ(x0:k − x(i)0:k) Resample j = 1 . . . N

• This results in a new representation as

φ(x) ≈1

N

j

Z̃kδ(x0:k − x(j)0:k,new)

x(i)0:k ← x

(j)0:k,new W

(i)k ← Z̃k/N

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 45

Page 47: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

Optimal proposal distribution

• The algorithm in the previous example is known as Bootstrap particle filter orSequential Importance Sampling/Resampling (SIS/SIR).

• Can we come up with a better proposal in a sequential setting?

– We are not allowed to move previous sampling points x(i)1:k−1 (because in

many applications we can’t even store them)

– Better in the sense of minimizing the variance of weight function Wk(x).(remember the optimality story in Eq.(3) and set f(x) = 1).

• The answer turns out to be the filtering distribution

q(xk|x1:k−1, y1:k) = p(xk|xk−1, yk) (7)

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 46

Page 48: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

Optimal proposal distribution (cont.)

• The weight is given by

x(i)k ∼ p(xk|x

(i)k−1, yk) Extend trajectory

W(i)k = u

(i)k|0:k−1W

(i)k−1 Update weight

u(i)k|0:k−1 =

p(yk|x(i)k )p(x

(i)k |x

(i)k−1)

p(x(i)k |x

(i)k−1, yk)

×p(yk|x

(i)k−1)

p(yk|x(i)k−1)

=p(yk, x

(i)k |x

(i)k−1)p(yk|x

(i)k−1)

p(x(i)k , yk|x

(i)k−1)

= p(yk|x(i)k−1)

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 47

Page 49: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

A Generic Particle Filter

1. Generation :Compute the proposal distribution q(xk|x

(i)0:k−1, y1:k).

Generate offsprings for i = 1 . . . N

x̂(i)k ∼ q(xk|x

(i)0:k−1, y1:k)

2. Evaluate importance weights

W(i)k =

p(yk|x̂(i)k )p(x̂

(i)k |x

(i)k−1)

q(x̂(i)k |x

(i)0:k−1, y1:k)

W(i)k−1 x

(i)0:k = (x̂

(i)k , x

(i)0:k−1)

3. Resampling (optional but recommended)

Normalize weigts W̃(i)k = W

(i)k /Z̃k Z̃k ≡

X

jW

(j)k

Resample x(j)0:k,new ∼

NXi=1

W̃ (i)δ(x0:k − x(i)0:k) j = 1 . . . N

Reset x(i)0:k ← x

(j)0:k,new W

(i)k ← Z̃k/N

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 48

Page 50: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

Particle Filtering

0 2 4 6 8 10 12 14 16−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

τ

ω

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 49

Page 51: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

Summary

• Time Series Models and Inference

– Nonlinear Dynamical systems– Conditionally Gaussian Switching State Space Models– Change-point models

• Importance Sampling, Resampling

• Putting it all together, Sequential Monte Carlo

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 50

Page 52: Time series models, Importance sampling and Sequential Monte …cemgil/papers/5R1/smc-tutor.pdf · Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 16 Dynamical

The End

Slides are onlinehttp://www-sigproc.eng.cam.ac.uk/ ˜ atc27/papers/5R1/smc-tutor.pdf

Cemgil 5R1 Stochastic Processes, Introduction to SMC. March 08, 2007 51