Change-point Detection on a Tree to Study Evolutionary...

92
Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day Species ecile An´ e 1,2 , Paul Bastide 3,4 , Mahendra Mariadassou 4 , St´ ephane Robin 3 1 Department of Statistics, University of Wisconsin–Madison, WI, 53706, USA 2 Department of Botany, University of Wisconsin–Madison, WI, 53706, USA 3 UMR MIA-Paris, AgroParisTech, INRA, Universit´ e Paris-Saclay, 75005, Paris, France 4 MaIAGE, INRA, Universit´ e Paris-Saclay, 78352 Jouy-en-Josas, France 28 June 2016

Transcript of Change-point Detection on a Tree to Study Evolutionary...

Page 1: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

Change-point Detection on a Tree to StudyEvolutionary Adaptation from Present-day

Species

Cecile Ane1,2, Paul Bastide3,4, Mahendra Mariadassou4,Stephane Robin3

1 Department of Statistics, University of Wisconsin–Madison, WI, 53706, USA2 Department of Botany, University of Wisconsin–Madison, WI, 53706, USA3 UMR MIA-Paris, AgroParisTech, INRA, Universite Paris-Saclay, 75005, Paris, France4 MaIAGE, INRA, Universite Paris-Saclay, 78352 Jouy-en-Josas, France

28 June 2016

Page 2: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

Stochastic Processes on TreesIdentifiability Problems and Counting Issues

Statistical InferenceTurtles Data Set

Introduction0

Unit

200 150 100 50 0

Turtles phylogenetic tree with habitats.(Jaffe et al., 2011).

Dermochelys Coriacea

Homopus Areolatus

• How can we explain the diversity, while accounting for thephylogenetic correlations ?

• Modelling: a shifted stochastic process on the phylogeny.

CA, PB, MM, SR Change-point Detection on a Tree 2/17

Page 3: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

Stochastic Processes on TreesIdentifiability Problems and Counting Issues

Statistical InferenceTurtles Data Set

Outline

1 Stochastic Processes on Trees

2 Identifiability Problems and Counting Issues

3 Statistical Inference

4 Turtles Data Set

CA, PB, MM, SR Change-point Detection on a Tree 3/17

Page 4: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

Stochastic Processes on TreesIdentifiability Problems and Counting Issues

Statistical InferenceTurtles Data Set

Principle of the ModelingShifts

Stochastic Process on a Tree (Felsenstein, 1985)

E

D

C

B

A

F

G

H

tAB

t

R

−4

−2

02

46

time

phen

otyp

e

−200 −150 −100 −50 0

E

D

C

B

A

R

F

G

H

The tree is known.

Only tip values areobserved

Brownian Motion:

Var [A | R ] = σ2t

Cov [A; B | R ] = σ2tAB

CA, PB, MM, SR Change-point Detection on a Tree 4/17

Page 5: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

Stochastic Processes on TreesIdentifiability Problems and Counting Issues

Statistical InferenceTurtles Data Set

Principle of the ModelingShifts

Ornstein-Uhlenbeck Modeling (Hansen, 1997)

01

23

4

time

phen

otyp

e

−200 −150 −100 −50 0

W(t)β

(1 − e−αt)βt1 2 = ln(2) α dW (t) = α[β −W (t)]dt + σdB(t)

Deterministic part :

• β : primary optimum (mechanistically defined).

• ln(2)/α : phylogenetic half live.

Stochastic part :

• W (t) : trait value (actual optimum).

• σdB(t) : Brownian fluctuations.

CA, PB, MM, SR Change-point Detection on a Tree 5/17

Page 6: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

Stochastic Processes on TreesIdentifiability Problems and Counting Issues

Statistical InferenceTurtles Data Set

Principle of the ModelingShifts

BM vs OU

Equation Stationary State Variance

−4

−2

01

time

phen

otyp

e

−200 −150 −100 −50 0

W(t)

dW (t) = σdB(t) None. σij = σ2tij

01

23

4

time

phen

otyp

e

−200 −150 −100 −50 0

W(t)β

(1 − e−αt)βt1 2 = ln(2) α

dW (t) = σdB(t)

+α[β−W (t)]dt

µ = β

γ2 =σ2

σij = γ2e−α(ti +tj )

× (e2αtij − 1)

CA, PB, MM, SR Change-point Detection on a Tree 6/17

Page 7: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

Stochastic Processes on TreesIdentifiability Problems and Counting Issues

Statistical InferenceTurtles Data Set

Principle of the ModelingShifts

Shifts

E

D

C

B

A

R

−4

−2

02

46

time

phen

otyp

e

−200 −150 −100 −50 0

E

D

C

B

A

R

BM Shifts in the mean:

mchild = mparent + δ

−2

−1

01

2

time

phen

otyp

e

−200 −150 −100 −50 0

E

D

C

B

AR

OU Shifts in the optimal value:

βchild = βparent + δ

CA, PB, MM, SR Change-point Detection on a Tree 7/17

Page 8: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

Stochastic Processes on TreesIdentifiability Problems and Counting Issues

Statistical InferenceTurtles Data Set

Principle of the ModelingShifts

Shifts

E

D

C

B

A

R

δ

−4

−2

02

46

time

phen

otyp

e

−200 −150 −100 −50 0

E

D

C

B

A

R

BM Shifts in the mean:

mchild = mparent + δ

−2

−1

01

2

time

phen

otyp

e

−200 −150 −100 −50 0

E

D

C

B

AR

OU Shifts in the optimal value:

βchild = βparent + δ

CA, PB, MM, SR Change-point Detection on a Tree 7/17

Page 9: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

Stochastic Processes on TreesIdentifiability Problems and Counting Issues

Statistical InferenceTurtles Data Set

Principle of the ModelingShifts

Shifts

E

D

C

B

A

R

δ

05

10

time

phen

otyp

e

−200 −150 −100 −50 0

E

D

C

BA

E

D

C

BA

R

δ

BM Shifts in the mean:

mchild = mparent + δ

−2

−1

01

2

time

phen

otyp

e

−200 −150 −100 −50 0

E

D

C

B

AR

OU Shifts in the optimal value:

βchild = βparent + δ

CA, PB, MM, SR Change-point Detection on a Tree 7/17

Page 10: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

Stochastic Processes on TreesIdentifiability Problems and Counting Issues

Statistical InferenceTurtles Data Set

Principle of the ModelingShifts

Shifts

E

D

C

B

A

R

δ

05

10

time

phen

otyp

e

−200 −150 −100 −50 0

E

D

C

BA

E

D

C

BA

R

δ

BM Shifts in the mean:

mchild = mparent + δ

02

46

time

phen

otyp

e

−200 −150 −100 −50 0

ED

C

BA

ED

C

BAR

δ

OU Shifts in the optimal value:

βchild = βparent + δ

CA, PB, MM, SR Change-point Detection on a Tree 7/17

Page 11: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

Stochastic Processes on TreesIdentifiability Problems and Counting Issues

Statistical InferenceTurtles Data Set

Identifiability ProblemsNumber of Parsimonious SolutionsNumber of Models with K Shifts

Equivalencies

• Number of shifts K fixed, several equivalent solutions.

µ

δ1 δ2

µ+δ2µ+δ1

µ

δ2 − δ1

δ1

• Problem of over-parametrization: parsimonious configurations.

CA, PB, MM, SR Change-point Detection on a Tree 8/17

Page 12: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

Stochastic Processes on TreesIdentifiability Problems and Counting Issues

Statistical InferenceTurtles Data Set

Identifiability ProblemsNumber of Parsimonious SolutionsNumber of Models with K Shifts

Equivalencies

• Number of shifts K fixed, several equivalent solutions.

µ

δ1 δ2

µ+δ2µ+δ1

µ

δ2 − δ1

δ1

µ+ δ1

δ1 − δ2

µ+ δ2

µ+δ2µ+δ1

δ2 − δ1

• Problem of over-parametrization: parsimonious configurations.

CA, PB, MM, SR Change-point Detection on a Tree 8/17

Page 13: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

Stochastic Processes on TreesIdentifiability Problems and Counting Issues

Statistical InferenceTurtles Data Set

Identifiability ProblemsNumber of Parsimonious SolutionsNumber of Models with K Shifts

Parsimonious Solution : Definition

Definition (Parsimonious Allocation)

A coloring of the tips being given, a parsimonious allocation of theshifts is such that it has a minimum number of shifts.

CA, PB, MM, SR Change-point Detection on a Tree 9/17

Page 14: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

Stochastic Processes on TreesIdentifiability Problems and Counting Issues

Statistical InferenceTurtles Data Set

Identifiability ProblemsNumber of Parsimonious SolutionsNumber of Models with K Shifts

Parsimonious Solution : Definition

Definition (Parsimonious Allocation)

A coloring of the tips being given, a parsimonious allocation of theshifts is such that it has a minimum number of shifts.

CA, PB, MM, SR Change-point Detection on a Tree 9/17

Page 15: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

Stochastic Processes on TreesIdentifiability Problems and Counting Issues

Statistical InferenceTurtles Data Set

Identifiability ProblemsNumber of Parsimonious SolutionsNumber of Models with K Shifts

Parsimonious Solution : Definition

Definition (Parsimonious Allocation)

A coloring of the tips being given, a parsimonious allocation of theshifts is such that it has a minimum number of shifts.

CA, PB, MM, SR Change-point Detection on a Tree 9/17

Page 16: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

Stochastic Processes on TreesIdentifiability Problems and Counting Issues

Statistical InferenceTurtles Data Set

Identifiability ProblemsNumber of Parsimonious SolutionsNumber of Models with K Shifts

Parsimonious Solution : Definition

Definition (Parsimonious Allocation)

A coloring of the tips being given, a parsimonious allocation of theshifts is such that it has a minimum number of shifts.

CA, PB, MM, SR Change-point Detection on a Tree 9/17

Page 17: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

Stochastic Processes on TreesIdentifiability Problems and Counting Issues

Statistical InferenceTurtles Data Set

Identifiability ProblemsNumber of Parsimonious SolutionsNumber of Models with K Shifts

Parsimonious Solution : Definition

Definition (Parsimonious Allocation)

A coloring of the tips being given, a parsimonious allocation of theshifts is such that it has a minimum number of shifts.

CA, PB, MM, SR Change-point Detection on a Tree 9/17

Page 18: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

Stochastic Processes on TreesIdentifiability Problems and Counting Issues

Statistical InferenceTurtles Data Set

Identifiability ProblemsNumber of Parsimonious SolutionsNumber of Models with K Shifts

Parsimonious Solution : Definition

Definition (Parsimonious Allocation)

A coloring of the tips being given, a parsimonious allocation of theshifts is such that it has a minimum number of shifts.

CA, PB, MM, SR Change-point Detection on a Tree 9/17

Page 19: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

Stochastic Processes on TreesIdentifiability Problems and Counting Issues

Statistical InferenceTurtles Data Set

Identifiability ProblemsNumber of Parsimonious SolutionsNumber of Models with K Shifts

Parsimonious Solution : Definition

Definition (Parsimonious Allocation)

A coloring of the tips being given, a parsimonious allocation of theshifts is such that it has a minimum number of shifts.

CA, PB, MM, SR Change-point Detection on a Tree 9/17

Page 20: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

Stochastic Processes on TreesIdentifiability Problems and Counting Issues

Statistical InferenceTurtles Data Set

Identifiability ProblemsNumber of Parsimonious SolutionsNumber of Models with K Shifts

Equivalent Parsimonious Allocations

Definition (Equivalency)

Two allocations are said to be equivalent (noted ∼) if they areboth parsimonious and give the same colors at the tips.

Find one solution Several existing Dynamic Programmingalgorithms (Fitch, Sankoff, see Felsenstein, 2004).

Enumerate all solutions New recursive algorithm, adapted fromprevious ones (and implemented in R).

+

CA, PB, MM, SR Change-point Detection on a Tree 10/17

Page 21: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

Stochastic Processes on TreesIdentifiability Problems and Counting Issues

Statistical InferenceTurtles Data Set

Identifiability ProblemsNumber of Parsimonious SolutionsNumber of Models with K Shifts

Equivalent Parsimonious Solutions for an OU Model.

06

5 1

0.53

−6

4.16 1

3.16

−35.76

−29.76

1

3.16

−4.16−6

1

3.16

−56

1

Equivalent allocations and values of the shifts - OU.

CA, PB, MM, SR Change-point Detection on a Tree 11/17

Page 22: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

Stochastic Processes on TreesIdentifiability Problems and Counting Issues

Statistical InferenceTurtles Data Set

Identifiability ProblemsNumber of Parsimonious SolutionsNumber of Models with K Shifts

Collection of Models

New Problem Number of Equivalence Classes:∣∣SPI

K

∣∣ ?

•∣∣SPI

K

∣∣ ≤ (m+n−1K

)=(# of edges

# of shifts

)• A recursive algorithm to compute

∣∣SPIK

∣∣ (implemented in R).

7→ Generally dependent on the topology of the tree. +

• Binary tree:∣∣SPI

K

∣∣ =(2n−2−K

K

)=(# of edges−# of shifts

# of shifts

)

CA, PB, MM, SR Change-point Detection on a Tree 12/17

Page 23: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

Stochastic Processes on TreesIdentifiability Problems and Counting Issues

Statistical InferenceTurtles Data Set

EM AlgorithmModel Selection

EM Algorithm: number of shifts K fixed

Y5

Y4

Y3

Y2

Y1

Z1

Z4

Z2

Z3

`7

δ

`4

BM Z4|Z1∼ N(Z1 , σ2`4

)Y3|Z2∼ N

(Z2 + δ, σ2`7

)

pθ(Z ,Y ) = pθ(Z1)∏

1<j≤m

pθ(Zj |Zparent(j))∏

1≤i≤n

pθ(Yi |Zparent(i))

EM Recursive algorithm to find θK = argmaxη∈SPIKpθη

(Y ):

E step Given θh, compute pθh (Z | Y )

(G)M step θh+1 raises Eθh [ log pθ(Z ,Y ) | Y ]

+

CA, PB, MM, SR Change-point Detection on a Tree 13/17

Page 24: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

Stochastic Processes on TreesIdentifiability Problems and Counting Issues

Statistical InferenceTurtles Data Set

EM AlgorithmModel Selection

Model Selection on K

0

Unit

−1.9

2

−2

5

2

2

Simulated OU (α =3, γ2 =0.1)

CA, PB, MM, SR Change-point Detection on a Tree 14/17

Page 25: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

Stochastic Processes on TreesIdentifiability Problems and Counting Issues

Statistical InferenceTurtles Data Set

EM AlgorithmModel Selection

Model Selection on K

0

Unit

0.3

θK = argmaxη∈SPI

K

pθη(Y )

−100

−50

0

50

0 5 10 15K

Log

Like

lihoo

d

LL = log pθK(Y )

CA, PB, MM, SR Change-point Detection on a Tree 14/17

Page 26: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

Stochastic Processes on TreesIdentifiability Problems and Counting Issues

Statistical InferenceTurtles Data Set

EM AlgorithmModel Selection

Model Selection on K

0

Unit

−0.3

5.3

θK = argmaxη∈SPI

K

pθη(Y )

−100

−50

0

50

0 5 10 15K

Log

Like

lihoo

d

LL = log pθK(Y )

CA, PB, MM, SR Change-point Detection on a Tree 14/17

Page 27: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

Stochastic Processes on TreesIdentifiability Problems and Counting Issues

Statistical InferenceTurtles Data Set

EM AlgorithmModel Selection

Model Selection on K

0

Unit

−1.7

2

4.7

θK = argmaxη∈SPI

K

pθη(Y )

−100

−50

0

50

0 5 10 15K

Log

Like

lihoo

d

LL = log pθK(Y )

CA, PB, MM, SR Change-point Detection on a Tree 14/17

Page 28: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

Stochastic Processes on TreesIdentifiability Problems and Counting Issues

Statistical InferenceTurtles Data Set

EM AlgorithmModel Selection

Model Selection on K

0

Unit

−1.7

2.4

−2.6

4.3

θK = argmaxη∈SPI

K

pθη(Y )

−100

−50

0

50

0 5 10 15K

Log

Like

lihoo

d

LL = log pθK(Y )

CA, PB, MM, SR Change-point Detection on a Tree 14/17

Page 29: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

Stochastic Processes on TreesIdentifiability Problems and Counting Issues

Statistical InferenceTurtles Data Set

EM AlgorithmModel Selection

Model Selection on K

0

Unit

−1.7

2.2

−2.4

4.5

1.3

θK = argmaxη∈SPI

K

pθη(Y )

−100

−50

0

50

0 5 10 15K

Log

Like

lihoo

d

LL = log pθK(Y )

CA, PB, MM, SR Change-point Detection on a Tree 14/17

Page 30: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

Stochastic Processes on TreesIdentifiability Problems and Counting Issues

Statistical InferenceTurtles Data Set

EM AlgorithmModel Selection

Model Selection on K

0

Unit

−1.7

1.9

−2

4.9

1.6

1.6

θK = argmaxη∈SPI

K

pθη(Y )

−100

−50

0

50

0 5 10 15K

Log

Like

lihoo

d

LL = log pθK(Y )

CA, PB, MM, SR Change-point Detection on a Tree 14/17

Page 31: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

Stochastic Processes on TreesIdentifiability Problems and Counting Issues

Statistical InferenceTurtles Data Set

EM AlgorithmModel Selection

Model Selection on K

0

Unit

−1.7

1.9

−2

4.7 0.8

1.6

1.6

θK = argmaxη∈SPI

K

pθη(Y )

−100

−50

0

50

0 5 10 15K

Log

Like

lihoo

d

LL = log pθK(Y )

CA, PB, MM, SR Change-point Detection on a Tree 14/17

Page 32: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

Stochastic Processes on TreesIdentifiability Problems and Counting Issues

Statistical InferenceTurtles Data Set

EM AlgorithmModel Selection

Model Selection on K

0

Unit

−1.7

1.8

−2

4.8 0.81.1

1.7

1.7

θK = argmaxη∈SPI

K

pθη(Y )

−100

−50

0

50

0 5 10 15K

Log

Like

lihoo

d

LL = log pθK(Y )

CA, PB, MM, SR Change-point Detection on a Tree 14/17

Page 33: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

Stochastic Processes on TreesIdentifiability Problems and Counting Issues

Statistical InferenceTurtles Data Set

EM AlgorithmModel Selection

Model Selection on K

0

Unit

−1.6

1.7

0.8

−2 0.8

4.80.8

0.90.8 1.2

0.4

1.8 −1

−16.5

1.8

−0.7

θK = argmaxη∈SPI

K

pθη(Y )

−100

−50

0

50

0 5 10 15K

Log

Like

lihoo

d

LL = log pθK(Y )

CA, PB, MM, SR Change-point Detection on a Tree 14/17

Page 34: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

Stochastic Processes on TreesIdentifiability Problems and Counting Issues

Statistical InferenceTurtles Data Set

EM AlgorithmModel Selection

Model Selection: Penalized Likelihood

Idea K = argmax0≤K≤Kmax

log pθK

(Y )− pen(K )

−100

−50

0

50

0 5 10 20K

Pen

aliz

ed L

L

criteria

LL

CA, PB, MM, SR Change-point Detection on a Tree 15/17

Page 35: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

Stochastic Processes on TreesIdentifiability Problems and Counting Issues

Statistical InferenceTurtles Data Set

EM AlgorithmModel Selection

Model Selection: Penalized Likelihood

Idea K = argmax0≤K≤Kmax

log pθK

(Y )− pen(K )

−100

−50

0

50

0 5 10 20K

Pen

aliz

ed L

L

criteria

LL

AIC

Penalties:

AIC K + 3

CA, PB, MM, SR Change-point Detection on a Tree 15/17

Page 36: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

Stochastic Processes on TreesIdentifiability Problems and Counting Issues

Statistical InferenceTurtles Data Set

EM AlgorithmModel Selection

Model Selection: Penalized Likelihood

Idea K = argmax0≤K≤Kmax

log pθK

(Y )− pen(K )

−100

−50

0

50

0 5 10 20K

Pen

aliz

ed L

L criteria

LL

AIC

BIC

Penalties:

AIC K + 3

BIC 12(K + 3) log(n)

CA, PB, MM, SR Change-point Detection on a Tree 15/17

Page 37: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

Stochastic Processes on TreesIdentifiability Problems and Counting Issues

Statistical InferenceTurtles Data Set

EM AlgorithmModel Selection

Model Selection: Penalized Likelihood

Idea K = argmax0≤K≤Kmax

log pθK

(Y )− pen(K )

−100

−50

0

50

0 5 10 20K

Pen

aliz

ed L

L criteria

LL

AIC

BIC

LINselect

Penalties:

AIC K + 3

BIC 12(K + 3) log(n)

LINselect pen(n,K ,∣∣SPI

K

∣∣)Based on Baraud et al. (2009)

+

CA, PB, MM, SR Change-point Detection on a Tree 15/17

Page 38: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

Stochastic Processes on TreesIdentifiability Problems and Counting Issues

Statistical InferenceTurtles Data Set

Turtles Dataset

0

Unit

200 150 100 50 0

FreshwaterIslandMainlandSaltwater

Colors: habitats.Boxes: selected EM regimes.

Habitat EMNo. of shifts 16 5

No. of regimes 4 6lnL -133.86 -97.59

ln 2/α (%) 7.44 5.43σ2/2α 0.33 0.22

CPU t (min) 65.25 134.49

(Jaffe et al., 2011)

CA, PB, MM, SR Change-point Detection on a Tree 16/17

Page 39: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

Stochastic Processes on TreesIdentifiability Problems and Counting Issues

Statistical InferenceTurtles Data Set

Turtles Dataset

0

Unit

200 150 100 50 0

FreshwaterIslandMainlandSaltwater

Colors: habitats.Boxes: selected EM regimes.

Chelonia mydas

CA, PB, MM, SR Change-point Detection on a Tree 16/17

Page 40: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

Stochastic Processes on TreesIdentifiability Problems and Counting Issues

Statistical InferenceTurtles Data Set

Turtles Dataset

0

Unit

200 150 100 50 0

FreshwaterIslandMainlandSaltwater

Colors: habitats.Boxes: selected EM regimes.

Geochelone nigra abingdoni

CA, PB, MM, SR Change-point Detection on a Tree 16/17

Page 41: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

Stochastic Processes on TreesIdentifiability Problems and Counting Issues

Statistical InferenceTurtles Data Set

Turtles Dataset

0

Unit

200 150 100 50 0

FreshwaterIslandMainlandSaltwater

Colors: habitats.Boxes: selected EM regimes.

Chitra indica

CA, PB, MM, SR Change-point Detection on a Tree 16/17

Page 42: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

Stochastic Processes on TreesIdentifiability Problems and Counting Issues

Statistical InferenceTurtles Data Set

Conclusion and Perspectives

A general inference framework for trait evolution models.

Conclusions • Identifiability can be assessed.• An EM can be written to maximize likelihood.• Model selection for a non-iid framework.

R codes Available on GitHub:https://github.com/pbastide/PhylogeneticEM

Perspectives • Multivariate traits.• Deal with uncertainty (tree, data).• Phylogenetic networks.

CA, PB, MM, SR Change-point Detection on a Tree 17/17

Page 43: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

Bibliography

Y. Baraud, C. Giraud, and S. Huet. Gaussian Model Selection with an Unknown Variance. The Annals of Statistics,37(2):630–672, Apr. 2009.

J.-P. Baudry, C. Maugis, and B. Michel. Slope Heuristics: Overview and Implementation. Statistics andComputing, 22(2):455–470, March 2012.

V. Brault, J.-P. Baudry, C. Maugis, and B. Michel. capushe: Capushe, Data-Driven Slope Estimation andDimension Jump. R package version 1.0, 2012.

J. Felsenstein. Phylogenies and the Comparative Method. The American Naturalist, 125(1):1–15, Jan. 1985.

J. Felsenstein. Inferring Phylogenies. Sinauer Associates, Suderland, USA, 2004.

T. F. Hansen. Stabilizing Selection and the Comparative Analysis of Adaptation. Evolution, 51(5):1341–1351,October 1997.

A. L. Jaffe, G. J. Slater, and M. E. Alfaro. The Evolution of Island Gigantism and Body Size Variation in Tortoisesand Turtles. Biology letters, 11(11), November 2011.

P. Massart. Concentration Inequalities and Model Selection, volume 1896 of Lecture Notes in Mathematics.Springer Berlin Heidelberg, 2007.

J. C. Uyeda and L. J. Harmon. A Novel Bayesian Method for Inferring and Interpreting the Dynamics of AdaptiveLandscapes from Phylogenetic Comparative Data. Systematic Biology, 63(6):902–918, July 2014.

Photo Credits :- ”Parrot-beaked Tortoise Homopus areolatus CapeTown 8” by Abu Shawka - Own work. Licensed under CC0 viaWikimedia Commons- ”Leatherback sea turtle Tinglar, USVI (5839996547)” by U.S. Fish and Wildlife Service Southeast Region -Leatherback sea turtle/ Tinglar, USVIUploaded by AlbertHerring. Licensed under CC BY 2.0 via WikimediaCommons- ”Hawaii turtle 2” by Brocken Inaglory. Licensed under CC BY-SA 3.0 via Wikimedia Commons- ”Dudhwalive chitra” by Krishna Kumar Mishra — Own work. Licensed under CC BY 3.0 via Wikimedia Commons- ”Lonesome George in profile” by Mike Weston - Flickr: Lonesome George 2. Licensed under CC BY 2.0 viaWikimedia Commons- ”Florida Box Turtle Digon3a”, ”Jonathan Zander (Digon3)” derivative work: Materialscientist

Page 44: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

Thank you for listening

pbastide.github.io

Page 45: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

ReferencesInference

Identifiability IssuesSimulations Results

Multivariate

Appendices

5 Inference• EM• Model Selection

6 Identifiability Issues• Cardinal of Equivalence Classes• Number of Tree Compatible Clustering

7 Simulations Results

8 Multivariate• Models• Inference

CA, PB, MM, SR Change-point Detection on a Tree 3/42

Page 46: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

ReferencesInference

Identifiability IssuesSimulations Results

Multivariate

EMModel Selection

E step

Y5

Y4

Y3

Y2

Y1

Z1

Z4

Z2

Z3

`7

δ

`4

Compute the following quantities:

E(h)[Zj | Y ], Var(h) [Zj | Y ] , Cov(h) [Zj ,Zparent(j) | Y]

• Using Gaussian properties. Need to invert matrices:complexity in O(n3).

• Using Gaussian properties and the tree structure:”Upward-Downward” algorithm. Complexity in O(n).

+

CA, PB, MM, SR Change-point Detection on a Tree 4/42

Page 47: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

ReferencesInference

Identifiability IssuesSimulations Results

Multivariate

EMModel Selection

M Step

Maximize:

E [ log pθ(X ) | Y ] = −m+n∑j=2

Cj (α, shifts) + F (h)(µ, γ2, σ2, α

)

• µ, γ2, σ2: simple maximization

• Discrete location of K shifts

7→ Exact and fast for the BM +

• α: numerical maximization and/or on a grid

7→ Generalized EM

CA, PB, MM, SR Change-point Detection on a Tree 5/42

Page 48: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

ReferencesInference

Identifiability IssuesSimulations Results

Multivariate

EMModel Selection

Initialization

Shifts : Lasso regression.

∆ = argmin∆

‖Y − TW (α)∆‖2

Σ−1YY

+ λ ‖∆−1‖1

• Initialize ΣYY (α), then estimate ∆ with a Gauss Lasso

procedure, using a Cholesky decomposition.+

• λ chosen to get K shifts.

The selection strength α : Initialization using couples of tips.

back

CA, PB, MM, SR Change-point Detection on a Tree 6/42

Page 49: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

ReferencesInference

Identifiability IssuesSimulations Results

Multivariate

EMModel Selection

Cholesky Decomposition

The problem is:

∆ = argmin∆

‖Y − R∆‖2

ΣYY+ λ |∆−1|1

Cholesky decomposition of ΣYY :

ΣYY = LLT , L a lower triangular matrix

Then:‖Y − R∆‖2

ΣYY=∥∥L−1Y − L−1R∆

∥∥2

And if Y ′ = L−1Y and R ′ = L−1R, the problem becomes:

∆ = argmin∆

∥∥Y ′ − R ′∆∥∥2

+ λ |∆−1|1

CA, PB, MM, SR Change-point Detection on a Tree 7/42

Page 50: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

ReferencesInference

Identifiability IssuesSimulations Results

Multivariate

EMModel Selection

Gauss Lasso

Let mλ be the set of selected variables (including the root). Then:

∆Gauss = ΠFλ(Y ′) with Fλ = SpanR ′j : j ∈ mλ

back

CA, PB, MM, SR Change-point Detection on a Tree 8/42

Page 51: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

ReferencesInference

Identifiability IssuesSimulations Results

Multivariate

EMModel Selection

Goal and Notations

Data A process on a tree with the following structure:

∀j > 1, Xj |Xpa(j) ∼ N(mj (Xpa(j)) = qjXpa(j) + rj , σ

2j

)

BM:

qj = 1

rj =∑

k

Iτk = bjδk

σ2j = `jσ

2

OU:

qj = e−α`j

rj = βpa(j)(1− e

−α`j ) +∑

k

Iτk = bjδk

(1− e

−α(1−νk )`j)

σ2j =

σ2

2α(1− e

−2α`j )

Goal Compute the following quantities, at every node j :

Var(h) [Zj | Y ] ,Cov(h) [Zj ,Zpa(j) | Y],E(h)[Zj | Y ]

CA, PB, MM, SR Change-point Detection on a Tree 9/42

Page 52: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

ReferencesInference

Identifiability IssuesSimulations Results

Multivariate

EMModel Selection

Upward

Goal Compute for a vector of tips, given their common ancestor:

f Yj |Xj(Yj ; a) = Aj (Yj )ΦMj (Yj ),S2

j (Yj )(a)

Initialization For tips: f Yi |Yi(Yi ; a) = ΦYi ,0(a)

Propagation

f Yj |Xj(Yj ; a) =

L∏l=1

f Yjl |Xj(Yjl ; a)

f Yjl |Xj(Yjl ; a) =

∫R

fYjl∣∣∣Xjl

(Yjl ; b)fXjl

∣∣∣Xj(b; a)db

Root Node and Likelihood At the root:

f X1|Y (a; Y) ∝ f Y|X1(Y; a)fX1

(a)Var [ X1 | Y ] =

(1

γ2+

1

S21 (Y)

)−1

E [ X1 | Y ] = Var [ X1 | Y ]

γ2+

M1(Y)

S21 (Y)

)

XjlXj1

XjL· · · · · ·

Xj

Yj1 Yjl YjL

CA, PB, MM, SR Change-point Detection on a Tree 10/42

Page 53: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

ReferencesInference

Identifiability IssuesSimulations Results

Multivariate

EMModel Selection

Downward

Compute Ej = E[

Xj

∣∣ Y], V 2

j = Var[

Xj

∣∣ Y], C 2

j,pa(j) = Cov[Xj ; Xpa(j)

∣∣ Y]

Initialization Last step of Upward.

Propagation

f Xpa(j),Xj |Y (a, b; Y) = f Xpa(j)|Y (a; Y)f Xj |Xpa(j),Y(b; a,Y)

f Xj |Xpa(j),Y(b; a,Y) = f Xj |Xpa(j),Y

j (b; a,Yj )

∝ f Xj |Xpa(j)(b; a)f Yj |Xj

(Yj ; b)

XjlXj1

XjL· · · · · ·

Xj

Yj1 Yjl YjL

CA, PB, MM, SR Change-point Detection on a Tree 11/42

Page 54: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

ReferencesInference

Identifiability IssuesSimulations Results

Multivariate

EMModel Selection

Formulas

Upward S2

j (Yj ) =

(L∑

l=1

q2jl

S2jl

(Yjl ) + σ2jl

)−1

Mj (Yj ) = S2j (Yj )

L∑l=1

qjl

Mjl (Yjl )− rjl

S2jl

(Yjl ) + σ2jl

Downward

C 2j,pa(j) = qj

S2j (Yj )

S2j (Yj ) + σ2

j

V 2pa(j)

Ej =S2

j (Yj )(qj Epa(j) + rj ) + σ2j Mj (Yj )

S2j (Yj ) + σ2

j

V 2j =

S2j (Yj )

S2j (Yj ) + σ2

j

(σ2

j + p2j

S2j (Yj )

S2j (Yj ) + σ2

j

V 2pa(j)

)

back

CA, PB, MM, SR Change-point Detection on a Tree 12/42

Page 55: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

ReferencesInference

Identifiability IssuesSimulations Results

Multivariate

EMModel Selection

M Step: Segmentation

Cj (α, τ, δ) = σ−2j

(E [ Xj | Y ]− qjE

[Xpa(j)

∣∣ Y]− rj − sj

∑k

Iτk = bjδk

)2

BM : rj = 0, each cost is independent.

C 0j (α) = σ−2

j

(E [ Xj | Y ]− qjE

[Xpa(j)

∣∣ Y] )2

C 1j (α, τ, δ) = σ−2

j

(E [ Xj | Y ]− qjE

[Xpa(j)

∣∣ Y]− sj

∑k

Iτk = bjδk

)2

Algorithm:

1 Find the K branches j1, . . . , jK with largest C 0j ;

2 Allocate one change point in the first K branches;

3 For each of these branches, set δ(h+1)jk

so that C 1j (τ, δ) = 0

CA, PB, MM, SR Change-point Detection on a Tree 13/42

Page 56: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

ReferencesInference

Identifiability IssuesSimulations Results

Multivariate

EMModel Selection

M Step: Segmentation

Cj (α, τ, δ) = σ−2j

(E [ Xj | Y ]− qjE

[Xpa(j)

∣∣ Y]− rj − sj

∑k

Iτk = bjδk

)2

BM : rj = 0, each cost is independent.

C 0j (α) = σ−2

j

(E [ Xj | Y ]− qjE

[Xpa(j)

∣∣ Y] )2

C 1j (α, τ, δ) = σ−2

j

(E [ Xj | Y ]− qjE

[Xpa(j)

∣∣ Y]− sj

∑k

Iτk = bjδk

)2

Algorithm:

1 Find the K branches j1, . . . , jK with largest C 0j ;

2 Allocate one change point in the first K branches;

3 For each of these branches, set δ(h+1)jk

so that C 1j (τ, δ) = 0

CA, PB, MM, SR Change-point Detection on a Tree 13/42

Page 57: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

ReferencesInference

Identifiability IssuesSimulations Results

Multivariate

EMModel Selection

M Step: Segmentation

Cj (α, τ, δ) = σ−2j

(E [ Xj | Y ]− qjE

[Xpa(j)

∣∣ Y]− rj − sj

∑k

Iτk = bjδk

)2

BM : rj = 0, each cost is independent.

C 0j (α) = σ−2

j

(E [ Xj | Y ]− qjE

[Xpa(j)

∣∣ Y] )2

C 1j (α, τ, δ) = σ−2

j

(E [ Xj | Y ]− qjE

[Xpa(j)

∣∣ Y]− sj

∑k

Iτk = bjδk

)2

Algorithm:

1 Find the K branches j1, . . . , jK with largest C 0j ;

2 Allocate one change point in the first K branches;

3 For each of these branches, set δ(h+1)jk

so that C 1j (τ, δ) = 0

CA, PB, MM, SR Change-point Detection on a Tree 13/42

Page 58: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

ReferencesInference

Identifiability IssuesSimulations Results

Multivariate

EMModel Selection

M Step: Segmentation

Cj (α, τ, δ) = σ−2j

(E [ Xj | Y ]− qjE

[Xpa(j)

∣∣ Y]− rj − sj

∑k

Iτk = bjδk

)2

BM : rj = 0, each cost is independent.

C 0j (α) = σ−2

j

(E [ Xj | Y ]− qjE

[Xpa(j)

∣∣ Y] )2

C 1j (α, τ, δ) = σ−2

j

(E [ Xj | Y ]− qjE

[Xpa(j)

∣∣ Y]− sj

∑k

Iτk = bjδk

)2

Algorithm:

1 Find the K branches j1, . . . , jK with largest C 0j ;

2 Allocate one change point in the first K branches;

3 For each of these branches, set δ(h+1)jk

so that C 1j (τ, δ) = 0

CA, PB, MM, SR Change-point Detection on a Tree 13/42

Page 59: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

ReferencesInference

Identifiability IssuesSimulations Results

Multivariate

EMModel Selection

M Step: Segmentation

Cj (α, τ, δ) = σ−2j

(E [ Xj | Y ]− qjE

[Xpa(j)

∣∣ Y]− rj − sj

∑k

Iτk = bjδk

)2

OU : rj = βpa(j), a cost depends on all its parents.

• Exact minimization: too costly.

• Need of an heuristic.

• Idea: rewrite as a least square:

‖D − AU∆‖2

with D a vector of size n + m, A a diagonal matrix of size n + m, ∆ thevector of shifts and U the incidence matrix of the tree.

• Then use Stepwise selection or LASSO. back

CA, PB, MM, SR Change-point Detection on a Tree 14/42

Page 60: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

ReferencesInference

Identifiability IssuesSimulations Results

Multivariate

EMModel Selection

Model Selection on K : LINselect

Goal

K = argmin0≤K≤p−1

∥∥∥Y − YK

∥∥∥2

V

(1 +

pen(K )

n − K − 1

)Oracle

infη∈⋃p−1

K=0 SPIK

∥∥E [Y ]− Y ∗η∥∥2

V

Definition (Baraud et al. (2009))

Let D, N > 0, and XD ∼ χ2(D), XN ∼ χ2(N), XD ⊥ XN .

Dkhi[D,N, x ] =1

E [XD ]E[(

XD − xXN

N

)+

], ∀x > 0

Dkhi[D,N,EDkhi[D,N, q]] = q, ∀0 < q ≤ 1

CA, PB, MM, SR Change-point Detection on a Tree 15/42

Page 61: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

ReferencesInference

Identifiability IssuesSimulations Results

Multivariate

EMModel Selection

Proposition: LINselect Penalty

Proposition (Form of the Penalty and guarantees (α known))

Under our setting: Y = TW (α)∆ + γE with E ∼ N (0,V ), define the penalty:

pen(K) = An − K − 1

n − K − 2EDkhi

[K + 2, n − K − 2, exp

(− log

∣∣∣SPIK

∣∣∣− 2 log(K + 2))]

If κ < 1, and p ≤ min(

κn2+log(2)+log(n)

, n − 7)

, we get:

E

∥∥∥E [Y ]− YK

∥∥∥2

V

γ2

≤ C(A, κ) infη∈M

∥∥E [Y ]− Y ∗η∥∥2

V

γ2+ (Kη + 2) (3 + log(n))

with C(A, κ) a constant depending on A and κ only.

back ?

CA, PB, MM, SR Change-point Detection on a Tree 16/42

Page 62: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

ReferencesInference

Identifiability IssuesSimulations Results

Multivariate

EMModel Selection

LINselect Model Selection: Important Points

Based on Baraud, Giraud, and Huet (2009)

• Non-asymptotic bound.• Unknown variance.• No constant to be calibrated.

Novelties • Non iid variance.• Penalty depends on the tree topology

(through∣∣SPI

K

∣∣).

CA, PB, MM, SR Change-point Detection on a Tree 17/42

Page 63: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

ReferencesInference

Identifiability IssuesSimulations Results

Multivariate

EMModel Selection

Model Selection with Unknown Variance

Theorem (Baraud et al. (2009))

Under the following setting:

Y ′ = E[Y ′]

+ γE ′ with E ′ ∼ N (0, In) and S′ = S ′η , η ∈M

If Dη = Dim(S′η), Nη = n − Dη ≥ 7, max(Lη,Dη) ≤ κn, with κ < 1, and:

Ω′ =∑η∈M

(Dη + 1)e−Lη < +∞

If: η = argminη∈M

∥∥∥Y ′ − Y ′η

∥∥∥2(

1 +pen(η)

)

with: pen(η) = penA,L(η) = ANη

Nη − 1EDkhi[Dη + 1,Nη − 1, e−Lη ] , A > 1

Then: E

∥∥∥E [Y ′]− Y ′η

∥∥∥2

γ2

≤ C(A, κ)

[infη∈M

∥∥E [Y ′]− Y ′η∥∥2

γ2+ max(Lη ,Dη)

+ Ω′

]

CA, PB, MM, SR Change-point Detection on a Tree 18/42

Page 64: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

ReferencesInference

Identifiability IssuesSimulations Results

Multivariate

EMModel Selection

IID Framework (α = 0)

Assume Kη = Dη − 1 ≤ p − 1 ≤ n − 8, ∀η ∈M

Then:

Ω′ =∑η∈M

(Dη + 1)e−Lη =∑η∈M

(Kη + 2)e−Lη

=

p−1∑K=0

∣∣∣SPIK

∣∣∣ (K + 2)e−LK =

p−1∑K=0

∣∣∣SPIK

∣∣∣ (K + 2)e−(log

∣∣∣SPIK

∣∣∣+2 log(K+2))

=

p−1∑K=0

1

K + 2≤ log(p) ≤ log(n)

And:

LK ≤ log(n + m − 1

K

)+2 log(K +2) ≤ K log(n+m−1)+2(K +1) ≤ p(2+log(2n−2))

Hence, if p ≤ min(

κn2+log(2)+log(n)

, n − 7)

, then max(Lη ,Dη) ≤ κn for any η ∈M.

CA, PB, MM, SR Change-point Detection on a Tree 19/42

Page 65: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

ReferencesInference

Identifiability IssuesSimulations Results

Multivariate

EMModel Selection

Non-IID Framework (α 6= 0)

Cholesky decomposition: V = LLT Y ′ = L−1Y s ′ = L−1s E ′ = L−1E

Y ′ = E[Y ′]

+ γE ′, with: E ′ ∼ N (0, In)

S ′η = L−1Sη, Y ′η = ProjS′ηY ′ = argmin

a′∈S′η

∥∥Y − La′∥∥2

V= L−1Yη

∥∥∥E [Y ]− Yη

∥∥∥2

V=∥∥∥E [Y ′]− Y ′η

∥∥∥2

,∥∥∥Y − Yη

∥∥∥2

V=∥∥∥Y ′ − Y ′η

∥∥∥2

CritMC (η) =∥∥∥Y ′ − Y ′η

∥∥∥2(

1 +penA,L(η)

)=∥∥∥Y − Yη

∥∥∥2

V

(1 +

penA,L(η)

)back

CA, PB, MM, SR Change-point Detection on a Tree 20/42

Page 66: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

ReferencesInference

Identifiability IssuesSimulations Results

Multivariate

Cardinal of Equivalence ClassesNumber of Tree Compatible Clustering

Cardinal of Equivalence Classes

Initialization For tipsPropagation

Klk = argmin

1≤p≤K

Sil (p) + Ip 6= k

Si (k) =

L∑l=1

Sil (pl ) + Ipl 6= k , ∀(p1, . . . pL) ∈ K1k × . . .×KL

k

Ti (k) =∑

(p1,...pL)∈K1k×...×K

Lk

L∏l=1

Til (pl ) =L∏

l=1

∑pl∈Kl

k

Til (pl )

Termination Sum on the root vectorback

(1,0,0) (1,0,0)(0,∞,∞)(0,∞,∞)S

T

CA, PB, MM, SR Change-point Detection on a Tree 21/42

Page 67: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

ReferencesInference

Identifiability IssuesSimulations Results

Multivariate

Cardinal of Equivalence ClassesNumber of Tree Compatible Clustering

Cardinal of Equivalence Classes

Initialization For tipsPropagation

Klk = argmin

1≤p≤K

Sil (p) + Ip 6= k

Si (k) =

L∑l=1

Sil (pl ) + Ipl 6= k , ∀(p1, . . . pL) ∈ K1k × . . .×KL

k

Ti (k) =∑

(p1,...pL)∈K1k×...×K

Lk

L∏l=1

Til (pl ) =L∏

l=1

∑pl∈Kl

k

Til (pl )

Termination Sum on the root vectorback

· · ·

(Si (1), · · · , Si (K))

(Ti (1), · · · ,Ti (K))

(Si1(k))k

(Ti1(k))k

(SiL(k))k

(TiL(k))k

CA, PB, MM, SR Change-point Detection on a Tree 21/42

Page 68: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

ReferencesInference

Identifiability IssuesSimulations Results

Multivariate

Cardinal of Equivalence ClassesNumber of Tree Compatible Clustering

Cardinal of Equivalence Classes

Initialization For tipsPropagation

Klk = argmin

1≤p≤K

Sil (p) + Ip 6= k

Si (k) =

L∑l=1

Sil (pl ) + Ipl 6= k , ∀(p1, . . . pL) ∈ K1k × . . .×KL

k

Ti (k) =∑

(p1,...pL)∈K1k×...×K

Lk

L∏l=1

Til (pl ) =L∏

l=1

∑pl∈Kl

k

Til (pl )

Termination Sum on the root vectorback

(1,0,0) (1,0,0)(0,∞,∞)(0,∞,∞)S

T

S( ) = 0 + 0 ; T( ) = 1 x 1

S( ) = 1 + 1 ; T( ) = 1 x 1

S( ) = 1 + 1 ; T( ) = 1 x 1

K1 = K1 = K1 =

CA, PB, MM, SR Change-point Detection on a Tree 21/42

Page 69: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

ReferencesInference

Identifiability IssuesSimulations Results

Multivariate

Cardinal of Equivalence ClassesNumber of Tree Compatible Clustering

Cardinal of Equivalence Classes

Initialization For tipsPropagation

Klk = argmin

1≤p≤K

Sil (p) + Ip 6= k

Si (k) =

L∑l=1

Sil (pl ) + Ipl 6= k , ∀(p1, . . . pL) ∈ K1k × . . .×KL

k

Ti (k) =∑

(p1,...pL)∈K1k×...×K

Lk

L∏l=1

Til (pl ) =L∏

l=1

∑pl∈Kl

k

Til (pl )

Termination Sum on the root vectorback

(1,0,0) (1,0,0) (0,1,0) (0,0,1) (0,0,1)

(1,1,1)

(1,1,1)

(1,1,1)

(1,1,3)

(0,∞,∞)(0,∞,∞)(∞,0,∞)(∞,∞,0)(∞,∞,0)ST

(0,2,2)

(1,1,0)

(1,1,2)

(2,2,2)

ST

ST

ST

ST

CA, PB, MM, SR Change-point Detection on a Tree 21/42

Page 70: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

ReferencesInference

Identifiability IssuesSimulations Results

Multivariate

Cardinal of Equivalence ClassesNumber of Tree Compatible Clustering

Linking Shifts and Clustering

Assumption “No Homoplasy”: 1 shift = 1 new color

Proposition “K shifts ⇐⇒ K + 1 clusters”

CA, PB, MM, SR Change-point Detection on a Tree 22/42

Page 71: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

ReferencesInference

Identifiability IssuesSimulations Results

Multivariate

Cardinal of Equivalence ClassesNumber of Tree Compatible Clustering

Linking Shifts and Clustering

Assumption “No Homoplasy”: 1 shift = 1 new color

∼The No Homoplasy hypothesis is not

respected.

Proposition “K shifts ⇐⇒ K + 1 clusters”

CA, PB, MM, SR Change-point Detection on a Tree 22/42

Page 72: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

ReferencesInference

Identifiability IssuesSimulations Results

Multivariate

Cardinal of Equivalence ClassesNumber of Tree Compatible Clustering

Linking Shifts and Clustering

Assumption “No Homoplasy”: 1 shift = 1 new color

∼The No Homoplasy hypothesis is not

respected.

Proposition “K shifts ⇐⇒ K + 1 clusters”

CA, PB, MM, SR Change-point Detection on a Tree 22/42

Page 73: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

ReferencesInference

Identifiability IssuesSimulations Results

Multivariate

Cardinal of Equivalence ClassesNumber of Tree Compatible Clustering

Definitions

• T a rooted tree with n tips

• N(T )K = |CK | the number of possible partitions of the tips in K clusters

• A(T )K the number of possible marked partitions

Partitions in two groups for a binarytree with 3 tips

Difference between N(T3)2 and A

(T3)2 :

• N(T3)2 = 3: partitions 1 and 2

are equivalent

• A(T3)2 = 4: one marked color

(“white = ancestral state”)

CA, PB, MM, SR Change-point Detection on a Tree 23/42

Page 74: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

ReferencesInference

Identifiability IssuesSimulations Results

Multivariate

Cardinal of Equivalence ClassesNumber of Tree Compatible Clustering

General Formula (Binary Case)

If T is a binary tree, consider T` and Tr the left and right sub-trees of T . Then:N

(T )K =

∑k1+k2=K

N(T`)k1

N(Tr )k2

+∑

k1+k2=K+1

A(T`)k1

A(Tr )k2

A(T )K =

∑k1+k2=K

A(T`)k1

N(Tr )k2

+ N(T`)k1

A(Tr )k2

+∑

k1+k2=K+1

A(T`)k1

A(Tr )k2

We get:

N(T )K+1 = N

(n)K+1 =

(2n − 2− K

K

)and A

(T )K+1 = A

(n)K+1 =

(2n − 1− K

K

)

CA, PB, MM, SR Change-point Detection on a Tree 24/42

Page 75: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

ReferencesInference

Identifiability IssuesSimulations Results

Multivariate

Cardinal of Equivalence ClassesNumber of Tree Compatible Clustering

Recursion Formula (General Case)

If we are at a node defining a tree T that has p daughters, with sub-trees T1, . . . , Tp ,then we get the following recursion formulas:

N(T )K =

∑k1+···+kp=K

k1,...,kp≥1

p∏i=1

N(Ti )ki

+∑

I⊂J1,pK|I |≥2

∑k1+···+kp=K+|I |−1

k1,...,kp≥1

∏i∈I

A(Ti )ki

∏i /∈I

N(Ti )ki

A(T )K =

∑I⊂J1,pK|I |≥1

∑k1+···+kp=K+|I |−1

k1,...,kp≥1

∏i∈I

A(Ti )ki

∏i /∈I

N(Ti )ki

No general formula. The result depends on the topology of the tree.

back

CA, PB, MM, SR Change-point Detection on a Tree 25/42

Page 76: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

ReferencesInference

Identifiability IssuesSimulations Results

Multivariate

Simulations Design (Uyeda and Harmon, 2014)

• Topology of the tree fixed (unit height, λ = 0.1, with 64, 128, 256 taxa).

• Initial optimal value fixed: β0 = 0

• One ”base” scenario αb = 3, γ2b = 0.5, Kb = 5.

• α ∈ log(2)/0.01, 0.05, 0.1, 0.2, 0.23, 0.3, 0.5, 0.75, 1, 2, 10.• γ2 ∈ 0.3, 0.6, 3, 6, 12, 18, 30, 60, 150/(2αb).

• K ∈ 0, 1, 2, 3, 4, 5, 8, 11, 16.• Shifts values ∼ 1

2N (4, 1) + 1

2N (−4, 1)

• Shifts randomly placed at regular intervals separated by 0.1 unit length.

• n = 200 repetitions : 16200 configurations.

CPU time on cluster MIGALE (Jouy-en-Josas):

• α known: 6 minutes per estimation (66 days in total).

• α unknown: 52 minutes per estimation (570 days in total).

CA, PB, MM, SR Change-point Detection on a Tree 26/42

Page 77: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

ReferencesInference

Identifiability IssuesSimulations Results

Multivariate

Log-Likelihood

t1 2 = ln(2) α γ2 K

−300

−200

−100

0

100

−800

−600

−400

−200

0

−300

−275

−250

−225

0.01

0.05 0.

1

0.2

0.23 0.

3

0.5

0.75 1 2 10

0.05 0.

1

0.5 1 2 3 5 10 25 0 1 2 3 4 5 8 11 16

Log

Like

lihoo

d

α

Known

Estimated

Log likelihood for a tree with 256 tips. Solid black dots are the median of thelog likelihood for the true parameters.

CA, PB, MM, SR Change-point Detection on a Tree 27/42

Page 78: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

ReferencesInference

Identifiability IssuesSimulations Results

Multivariate

Number of Shifts

t1 2 = ln(2) α γ2 K

0

2

4

6

8

0

3

6

9

0

5

10

15

ntaxa=

64ntaxa

=128

ntaxa=

256

0.01

0.05 0.

1

0.2

0.23 0.

3

0.5

0.75 1 2 10

0.05 0.

1

0.5 1 2 3 5 10 25 0 1 2 3 4 5 8 11 16

K s

elec

t α

Known

Estimated

CA, PB, MM, SR Change-point Detection on a Tree 28/42

Page 79: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

ReferencesInference

Identifiability IssuesSimulations Results

Multivariate

One Example

0

Unit

0

−1.7

−5.7−3.3 −2.7

−4.7

0

Unit

−0.3

−1.7

−4.6−5.6

0

Unit

−0.3

−1.7

−8.95.6

0

Unit

−0.3

−1.7

−11.6−6

CA, PB, MM, SR Change-point Detection on a Tree 29/42

Page 80: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

ReferencesInference

Identifiability IssuesSimulations Results

Multivariate

Adjusted Rand Index

t1 2 = ln(2) α γ2 K

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

ntaxa=

64ntaxa

=128

ntaxa=

256

0.01

0.05 0.

1

0.2

0.23 0.

3

0.5

0.75 1 2 10

0.05 0.

1

0.5 1 2 3 5 10 25 0 1 2 3 4 5 8 11 16

AR

I

K

Known

Estimated

α

Known

Estimated

CA, PB, MM, SR Change-point Detection on a Tree 30/42

Page 81: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

ReferencesInference

Identifiability IssuesSimulations Results

Multivariate

Parameters: β0

t1 2 = ln(2) α γ2 K

−2

0

2

−2

0

2

−2

0

2

ntaxa=

64ntaxa

=128

ntaxa=

256

0.01

0.05 0.

1

0.2

0.23 0.

3

0.5

0.75 1 2 10

0.05 0.

1

0.5 1 2 3 5 10 25 0 1 2 3 4 5 8 11 16

β 0−

β 0

K

Known

Estimated

α

Known

Estimated

CA, PB, MM, SR Change-point Detection on a Tree 31/42

Page 82: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

ReferencesInference

Identifiability IssuesSimulations Results

Multivariate

Parameters: α

t1 2 = ln(2) α γ2 K

0

10

20

30

0

2

4

6

8

−1

0

1

2

3

4

ntaxa=

64ntaxa

=128

ntaxa=

256

0.01

0.05 0.

1

0.2

0.23 0.

3

0.5

0.75 1 2 10

0.05 0.

1

0.5 1 2 3 5 10 25 0 1 2 3 4 5 8 11 16

(t1

2−

t 12)

t 12

K

Known

Estimated

CA, PB, MM, SR Change-point Detection on a Tree 32/42

Page 83: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

ReferencesInference

Identifiability IssuesSimulations Results

Multivariate

Parameters: γ2

t1 2 = ln(2) α γ2 K

−1.0

−0.5

0.0

0.5

1.0

−1.0

−0.5

0.0

0.5

1.0

−1.0

−0.5

0.0

0.5

1.0

ntaxa=

64ntaxa

=128

ntaxa=

256

0.01

0.05 0.

1

0.2

0.23 0.

3

0.5

0.75 1 2 10

0.05 0.

1

0.5 1 2 3 5 10 25 0 1 2 3 4 5 8 11 16

(γ2

−γ2 )

γ2

K

Known

Estimated

α

Known

Estimated

CA, PB, MM, SR Change-point Detection on a Tree 33/42

Page 84: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

ReferencesInference

Identifiability IssuesSimulations Results

Multivariate

ModelsInference

BM Model

Data n vectors of p traits at the tips: Yi =

Yi1...

Yip

SDE dW(t) = ΣdBt , rate matrix R = ΣΣT (p × p)

Covariances Cov [Yil ;Yjq] = tijRlq for i , j tips, and l , q characters

Var [vec(Y)] = Cn ⊗ R

Shifts K shifts δ1, · · · , δK vectors size p

7→ All the characters shift at the same time

CA, PB, MM, SR Change-point Detection on a Tree 34/42

Page 85: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

ReferencesInference

Identifiability IssuesSimulations Results

Multivariate

ModelsInference

BM Model

Linear Model Representation

vec(Y) = vec(∆TT ) + E with E ∼ N (0,V = Cn ⊗ R)

Incomplete Data Representation

Y3 | Z2 ∼ N(

Z2 + δ, `7R)

Y5

Y4

Y3

Y2

Y1

Z1

Z4

Z2

Z3

`7δ

CA, PB, MM, SR Change-point Detection on a Tree 35/42

Page 86: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

ReferencesInference

Identifiability IssuesSimulations Results

Multivariate

ModelsInference

OU Model: General Case

Data n vectors of p traits at the tips: Yi =

Yi1...

Yip

SDE A (p × p) “selection strength”

dW(t) = −A(W(t)− β(t))dt + ΣdBt

Covariances

Cov [Xi ; Xj ] = e−Ati Γe−AT tj

+ e−A(ti−tij )

(∫ tij

0e−Av ΣΣT e−AT vdv

)e−AT (tj−tij )

Shifts K shifts δ1, · · · , δK vectors size p

7→ On the optimal values

CA, PB, MM, SR Change-point Detection on a Tree 36/42

Page 87: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

ReferencesInference

Identifiability IssuesSimulations Results

Multivariate

ModelsInference

OU Model: A scalar

Assumption A = αIp “scalar”

Stationnary State S = 12αR

Fixed Root For i , j tips and l , q characters:

Cov [Yil ;Yjq] =1

2αe−2αh

(e2αtij − 1

)Rlq

7→ Can be reduced to a BM on a re-scaled tree

CA, PB, MM, SR Change-point Detection on a Tree 37/42

Page 88: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

ReferencesInference

Identifiability IssuesSimulations Results

Multivariate

ModelsInference

EM algorithm

BM Natural generalization of the univariate case.

OU M step intractable in general.

Incomplete Data Model: Can readily handle missing data.

CA, PB, MM, SR Change-point Detection on a Tree 38/42

Page 89: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

ReferencesInference

Identifiability IssuesSimulations Results

Multivariate

ModelsInference

Model Selection

• Previous criterion cannot be applied

• Solution: “Slope Heuristic”-based method

Massart (2007)

oracle inequality with known variancepenalty up to a multiplicative constant

Baudry et al. (2012)

Slope-heuristic method to calibrate the constantImplemented in capushe (Brault et al., 2012)

CA, PB, MM, SR Change-point Detection on a Tree 39/42

Page 90: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

ReferencesInference

Identifiability IssuesSimulations Results

Multivariate

ModelsInference

Model Selection: Toy Example

0

Unit

1

5

0

1

0

Unit

−1

5

0

1

0

Unit

0

5

−10

1

Figure: Simulated Process.

CA, PB, MM, SR Change-point Detection on a Tree 40/42

Page 91: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

ReferencesInference

Identifiability IssuesSimulations Results

Multivariate

ModelsInference

Model Selection: Toy Example−

350

−30

0−

250

−20

0−

150

Contrast representation

penshape(m) (labels : Model names)

−γ n

(sm)

The regression line is computed with 6 points

0 1 2 3 4 5 6 7 8 9 10

0.23

50.

250

0.26

5

Successive slope values

Number of points (penshape(m),− γn(sm)) for the regression

κ

11 11 10 9 8 8 7 6 5 5 4 3 2

Selected models with respect to the successive slope values

Number of points (penshape(m),− γn(sn)) for the regression

Mod

el11 11 10 9 8 8 7 6 5 5 4 3 2

2

Figure: capushe output for penalized log-likelihood.

CA, PB, MM, SR Change-point Detection on a Tree 41/42

Page 92: Change-point Detection on a Tree to Study Evolutionary ...pbastide.github.io/docs/20160628_JOBIM.pdf · Change-point Detection on a Tree to Study Evolutionary Adaptation from Present-day

ReferencesInference

Identifiability IssuesSimulations Results

Multivariate

ModelsInference

Model Selection: Toy Example

0

Unit

1.2

0.6

4.5

0

Unit

−1.1

0.3

5.2

0

Unit

−0.1

−10.1

4.6

Figure: Reconstructed Process.

CA, PB, MM, SR Change-point Detection on a Tree 42/42