Dependent processes in Bayesian Nonparametrics

41
Dependent processes in Bayesian nonparametrics Matteo Ruggiero University of Torino and Collegio Carlo Alberto Moncalieri, Feb 19 2016 0.0 0.2 0.4 0.6 0.8 1.0 time 1 0 0.029 0.059 0.088

Transcript of Dependent processes in Bayesian Nonparametrics

Page 1: Dependent processes in Bayesian Nonparametrics

Dependent processesin Bayesian nonparametrics

Matteo Ruggiero

University of Torino and Collegio Carlo Alberto

Moncalieri, Feb 19 2016

0.0 0.2 0.4 0.6 0.8 1.0

time 1

0

0.029

0.059

0.088

Page 2: Dependent processes in Bayesian Nonparametrics

1. Motivation and general settingBNP and discrete random probability measures

p = (p1, p2, . . .) frequencies in

∆∞ ={p ∈ [0, 1]∞ :

∑ipi = 1

}p↓ = (p(1), p(2), . . .) ordered frequencies in

∇∞ ={p ∈ [0, 1]∞ : p1 ≥ p2 ≥ · · · ≥ 0,

∑ipi = 1

}Assign law to p, which induces a distributionon ∆∞, ∇∞Otherwise assign to the indices unique labels

X1, X2, . . .iid∼ P0 continuous on X and define

the discrete measure

∞∑i=1

piδXi

which induces a distribution on P(X)

Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 3

Page 3: Dependent processes in Bayesian Nonparametrics

1. Motivation and general settingBNP and discrete random probability measures

Approach 1:model observations Yj directly with

p = (p1, p2, . . .) or P =∑∞

i=1piδXi

where Yj = Xi w.p. pi, and the (Xi, pi) are random

Approach 2:use mixtures to yield more flexibility and possibly aim at continuousdistributions

f(y) =

∫Xf(y | x)P (dx) ⇒ f(y) =

∑∞

i=1pif(y | Xi)

i.e. Yj ∼ f(y | Xi) w.p. pi and the (Xi, pi) are random

Use either approach as a base for estimation, uncertainty quantification,forecasting, clustering, . . .

Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 4

Page 4: Dependent processes in Bayesian Nonparametrics

1. Motivation and general settingMotivation for dependent processes

Assumptions in classical BNP approach:

observations are excheangeableobservations depend on a fixed environment/state of the worldinference is static (fixed time)/carried out on single environment

Data may not satisfy these assumptions (e.g. prices dynamics)

Need for more general types of dependence

Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 5

Page 5: Dependent processes in Bayesian Nonparametrics

1. Motivation and general settingPartial exchangeability

Natural extension is partial exchangeability (de Finetti sense), e.g.X1,1 X1,2 X1,3 · · ·X2,1 X2,2 X2,3 · · ·X3,1 X3,2 X3,3 · · ·· · · · · · · · · · · ·

row-wise exchangeability (not overall): given i, Xi,j are exchangeable

Accommodates e.g. temporal structures

Collection of random probability measures, indexed by some covariate

Can be extended to an uncountable family

Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 6

Page 6: Dependent processes in Bayesian Nonparametrics

1. Motivation and general settingDependent densities: discrete time

Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 7

Page 7: Dependent processes in Bayesian Nonparametrics

1. Motivation and general settingDependent densities: discrete time

Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 8

Page 8: Dependent processes in Bayesian Nonparametrics

1. Motivation and general settingDependent densities: continuous time

Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 9

Page 9: Dependent processes in Bayesian Nonparametrics

1. Motivation and general settingDependent densities: continuous time

Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 10

Page 10: Dependent processes in Bayesian Nonparametrics

1. Motivation and general settingDependent densities: continuous time

Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 11

Page 11: Dependent processes in Bayesian Nonparametrics

1. Motivation and general settingModelling and inference with time-dependent processes

Temporal dependence structure

Partial exchangeability, for any t we have a distribution (possibly a mixture)

(Possibly multiple) data available at discrete time points

Model collection of random probability measures, forming

a discrete time process, ora continuous-time process, with continuous paths or jumps

Nonparametric approach to allow for full flexibility

Analyse properties of the resulting model

Devise suitable strategies forposterior computation

Carry out inference on desired quantities

Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 12

Page 12: Dependent processes in Bayesian Nonparametrics

1. Motivation and general settingGeneral setting

X1, X2, . . .iid∼ P0 unique labels or locations in X

We are interested in time-dependent random probability measures of type

p(t) = (p1(t), p2(t), . . .) ∈ ∆∞

p↓(t) = (p(1)(t), p(2)(t), . . .) ∈ ∇∞

P (t) =∑∞

i=1pi(t)δXi(t) ∈P(X)

where t ≥ 0 represents time.

Discrete sample paths:p, p↓, P are countable collections of distributions, t ∈ NContinous sample paths:p, p↓, P are (random) t-continuous functions from [0,∞) to ∆∞,∇∞ or P(X)

Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 13

Page 13: Dependent processes in Bayesian Nonparametrics

2. Diffusive Dirichlet mixture modelsDirichlet process

The Dirichlet process [Ferguson 1973] extends the Dirichlet distribution from Kto infinitely many types

Can be defined via stick-breaking [Sethuraman 1994]

Viiid∼ Beta(1, θ), pi = Vi

i−1∏k=1

(1− Vk)

0 1p1 = V1 1− V1

V2

p2 (1− V1)(1− V2)

V3...

s.t. pi → 0 as i→∞ and∑i≥1 pi = 1

Take Xiiid∼ P0 with P0 continuous on X.

Then P =∑∞i=1 piδXi is a Dirichlet process

Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 15

Page 14: Dependent processes in Bayesian Nonparametrics

2. Diffusive Dirichlet mixture modelsDirichlet process

0.0 0.2 0.4 0.6 0.8 1.0

0.00

0.02

0.04

0.06

0.08

0.10

x

p

Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 16

Page 15: Dependent processes in Bayesian Nonparametrics

2. Diffusive Dirichlet mixture modelsDependent Dirichlet process

Basic idea [MacEachern, 1999]

We aim at defining a process

P (t) =

∞∑i=1

pi(t)δXi(t), t ≥ 0,

with Dirichlet process marginals

Handling both (p1(t), p2(t), . . .) and (X1(t), X2(t), . . .) can be non trivial.Consider instead

P (t) =∞∑i=1

pi(t)δXi , t ≥ 0, Xiiid∼ P0

Atoms are fixed, but there are infinitely many of them

In practice, as many as you need

Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 17

Page 16: Dependent processes in Bayesian Nonparametrics

2. Diffusive Dirichlet mixture modelsDiffusive Dirichlet process

Take the Dirichlet stick-breaking weights

pi = Vi

i−1∏k=1

(1− Vk), Vi ∼iid Beta(1, θ)

Substitute each component Vi ∈ [0, 1] with a diffusion {Vi(t)}t≥0 on [0, 1]

Then take

pi(t) = Vi(t)

i−1∏k=1

(1− Vk(t))

Each component needs to have Beta marginals, Vi(t) ∼ Beta(1, θ)

One-dimensional Wright–Fisher diffusions satisfy this

Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 18

Page 17: Dependent processes in Bayesian Nonparametrics

2. Diffusive Dirichlet mixture modelsWright–Fisher diffusions

0 2 4 6 8 10

Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 19

Page 18: Dependent processes in Bayesian Nonparametrics

2. Diffusive Dirichlet mixture modelsWright–Fisher diffusions

% of type 1 individuals (mutation rates: theta_1 = 2 , theta_2 = 8 )

Time (50K steps)

Sta

te s

pace

0 2 4 6 8 100

1

Ergodic frequencies against Stationary Distribution Beta( 2 , 8 )

State space0.0 0.2 0.4 0.6 0.8 1.0

01

23

Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 20

Page 19: Dependent processes in Bayesian Nonparametrics

2. Diffusive Dirichlet mixture modelsWright–Fisher diffusions

% of type 1 individuals (mutation rates: theta_1 = 8 , theta_2 = 8 )

Time (50K steps)

Sta

te s

pace

0 2 4 6 8 100

1

Ergodic frequencies against Stationary Distribution Beta( 8 , 8 )

State space0.0 0.2 0.4 0.6 0.8 1.0

0.0

1.5

3.0

Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 21

Page 20: Dependent processes in Bayesian Nonparametrics

2. Diffusive Dirichlet mixture modelsWright–Fisher diffusions

% of type 1 individuals (mutation rates: theta_1 = 0.4 , theta_2 = 0.4 )

Time (50K steps)

Sta

te s

pace

0 2 4 6 8 100

1

Ergodic frequencies against Stationary Distribution Beta( 0.4 , 0.4 )

State space0.0 0.2 0.4 0.6 0.8 1.0

04

8

Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 22

Page 21: Dependent processes in Bayesian Nonparametrics

2. Diffusive Dirichlet mixture modelsDiffusive Dirichlet process [Mena and R. 2016]

The resulting object

P (t) =∞∑i=1

(Vi(t)

i−1∏k=1

(1− Vk(t))︸ ︷︷ ︸pi(t)

)δXi , Vi(t) ∼WF(a, b)

has Dirichlet marginals for (a, b) = (1, θ), i.e. P (t) is a DP for all thas GEM marginals for (a, b) ∈ R2

+

has diffusive behaviour, P (t) is t-continuous in total variation

See also

Gutierrez, Mena and & R. 2016 (version with jumps)Mena, R. & Walker 2011 (geometric weights, different marginals)

for related models

Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 23

Page 22: Dependent processes in Bayesian Nonparametrics

2. Diffusive Dirichlet mixture modelsDiffusive Dirichlet process

0.0 0.2 0.4 0.6 0.8 1.0

time 1

0

0.029

0.059

0.088

Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 24

Page 23: Dependent processes in Bayesian Nonparametrics

2. Diffusive Dirichlet mixture modelsEstimation

At each time ti we have observations (yi,1, . . . , yi,ni).

Set up the hierarchical mixture

{Pt, t ≥ 0} ∼ diff-DP or GSB

xti | Pti ∼ Ptiyi,j | ti, xti

iid∼ f(· | xti)

equivalently yi is drawn from the time-dependent nonparametric mixture model

fti(y) =

∫Xf(y|x)Pti(dx) =

∞∑i=1

ptif(y | xi)

Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 25

Page 24: Dependent processes in Bayesian Nonparametrics

2. Diffusive Dirichlet mixture modelsSimulated data

True model

Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 26

Page 25: Dependent processes in Bayesian Nonparametrics

2. Diffusive Dirichlet mixture modelsSimulated data

Single data points

0 2 4 6 8 10

−20

24

68

True model (heat map), posterior mode (solid), 95% credible intervals for the mean (dashed), 95%quantiles of posterior density estimate (dotted).

Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 27

Page 26: Dependent processes in Bayesian Nonparametrics

2. Diffusive Dirichlet mixture modelsSimulated data

Multiple data points

0 2 4 6 8 10

−20

24

68

True model (heat map), posterior mode (solid), 95% credible intervals for the mean (dashed), 95%quantiles of posterior density estimate (dotted).

Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 28

Page 27: Dependent processes in Bayesian Nonparametrics

2. Diffusive Dirichlet mixture modelsReal data: S&P 500 (03/08 - 02/09)

Dependent density estimate

Heat map of estimated density (red), and mean estimate (solid)

Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 29

Page 28: Dependent processes in Bayesian Nonparametrics

2. Diffusive Dirichlet mixture modelsReal data: S&P 500 (03/08 - 02/09)

Dependent density estimate

160 170 180 190 200

800

900

1000

1100

Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 30

Page 29: Dependent processes in Bayesian Nonparametrics

2. Diffusive Dirichlet mixture modelsReal data: S&P 500 (03/08 - 02/09)

Dependent density estimate

160 170 180 190 200

800

900

1000

1100

Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 31

Page 30: Dependent processes in Bayesian Nonparametrics

3. Dynamic models for evolvingpopulations

A different view: modelling evolving populations

A sample path of p↓(t) = (p(1), . . . , p(7))

Time

0.1

0.2

0.3

0.4

0.5

0.6

0.7

FrequencyDynamic frenquencies of 7 species

Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 33

Page 31: Dependent processes in Bayesian Nonparametrics

3. Dynamic models for evolvingpopulations

A different view: modelling evolving populations

Distinct values X1, X2, . . . are interpreted asallelic types in geneticsplant or animal speciesunique identifiers of some evolving groups

Large population → species abundances approximate diffusive behaviours

If cannot provide an a priori upper bound, assume infinitely many species

Two different approaches:constructing stochastic models for pseudo-realistic evolutionary mechanisms(mutation, selection, recombination, migration, . . . )studying the association between certaindistributions and connected dynamics

Dynamics in figure are related toa Dirichlet distribution

Can we extend them? To what extent?With what interpretation?

Time

0.1

0.2

0.3

0.4

0.5

0.6

0.7

FrequencyDynamic frenquencies of 7 species

Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 34

Page 32: Dependent processes in Bayesian Nonparametrics

3. Dynamic models for evolvingpopulations

Wright–Fisher signals: Dirichlet-Multinomial model

Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 35

Page 33: Dependent processes in Bayesian Nonparametrics

3. Dynamic models for evolvingpopulations

Poisson-Dirichlet case

No. species Markov chain(N individuals)

KWright-Fisher(N,K, θ)

Fisher (1930), Wright (1931)

Diffusion(∞ individuals)

d

N →∞Wright-Fisher(K, θ)

Sato (1976)

stationary

w.r.t.Dir

(θK , . . . ,

θK

)Random measure

(t fixed)

∞ IMNA(θ)Ethier and Kurtz (1981)

d K →∞

PD(θ)Kingman (1975)

d K →∞

stationary

w.r.t.

Moran(N, θ)Watterson (1976)

d

N →∞

“d−→” = convergence in distribution

IMNA = infinitely many neutral alleles

Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 36

Page 34: Dependent processes in Bayesian Nonparametrics

3. Dynamic models for evolvingpopulations

Two-parameter Poisson-Dirichlet case

No. species

∞ PD(θ, α)Pitman (1995)

Random measure(t fixed)

Diffusion(∞ individuals)

IMNA(θ, α)Petrov (2009)

stationary

w.r.t.?? Moran(N, θ, α)

R. and Walker (2009)

d

N →∞

Markov chain(N individuals)

?? WF(K, θ, α)Costantini, De Blasi,

Ethier, R., Spano (2016)

d K →∞

K ?? WF(N,K, θ, α)Costantini, De Blasi,

Ethier, R., Spano (2016)

d

N →∞

stationary

w.r.t. ??

d K →∞

Remarks:

IMNA = infinitely many neutral allelesBased on Pitman’s generalized Polya urn schemeMutation and immigration

Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 37

Page 35: Dependent processes in Bayesian Nonparametrics

4. Computing time dependentposteriors

Continuous-time Gamma-Poisson model

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●●

●● ●

●●

● ●

CIR path X_t

0

5

10

15

20

25

30

35

0 10 20 30 40 50

Poisson(X_t) likelihood

Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 39

Page 36: Dependent processes in Bayesian Nonparametrics

4. Computing time dependentposteriors

Continuous-time Gamma-Poisson model

Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 40

Page 37: Dependent processes in Bayesian Nonparametrics

4. Computing time dependentposteriors

Continuous-time Gamma-Poisson model

Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 41

Page 38: Dependent processes in Bayesian Nonparametrics

4. Computing time dependentposteriors

The propagation mixture

Prior X ∼ πα := Gamma(α1, α2)

Likelihood Y | X ∼ Poisson(X)

Posterior X | Y1, . . . , Yn ∼ πα,n := Gamma(α1 +

∑n

i=1yi, α2 + n

)Propagation mixture [Papaspiliopoulos & R. 2014]

ψt(πα,n) :=

∫πα,n(x)Pt(x,dx

′)

is given by

ψt(πα,n) =

n∑j=0

pt(n, j)Gamma(α1 +

∑n

i=0yi − j, α2 + n− st

)for appropriate time-varying weights pt(n, j)

Can be extended to infinite dimensional models [Papaspiliopoulos, R. & Spano

2016]

Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 42

Page 39: Dependent processes in Bayesian Nonparametrics

4. Computing time dependentposteriors

Continuous-time Gamma-Poisson model

0 1 2 3 4 5 6 7

0.1

0.2

0.3

0.4

0.5

t � t0

Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 43

Page 40: Dependent processes in Bayesian Nonparametrics

4. Computing time dependentposteriors

Continuous-time Gamma-Poisson model

0 1 2 3 4 5 6 7

0.1

0.2

0.3

0.4

0.5

t � t0

Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 44

Page 41: Dependent processes in Bayesian Nonparametrics

4. Computing time dependentposteriors

Some references

Costantini, De Blasi, Ethier, R. and Spano (2016).Wright–Fisher construction of the two-parameter Poisson–Dirichlet diffusion.arXiv:1601.06064

Gutierrez, Mena & R. (2016).A time dependent Bayesian nonparametric model for air quality analysis.Comput. Statist. Data Anal.

Mena & R. (2016).Dynamic density estimation with diffusive Dirichlet mixtures. Bernoulli

Mena, R. & Walker (2011).Geometric stick-breaking processes for continuous-time Bayesian nonparametric modeling.J. Statist. Plann. Inf.

Papaspiliopoulos & R. (2014).Optimal filtering and the dual process. Bernoulli

Papaspiliopoulos, R. & Spano (2014).Filtering hidden Markov measures. arXiv:1411.4944

R. & Walker (2009).Countable representation for infinite dimensional diffusions derived from thetwo-parameter Poisson–Dirichlet process. Electr. Comm. Probab.

For more info: www.matteoruggiero.it

Matteo Ruggiero (Unito & CCA) Dependent processes in BNP 45