A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The...
Transcript of A Nonparametric Approach for Multiple Change Point ... · Introduction Change Point Analysis The...
A Nonparametric Approach for Multiple Change PointAnalysis of Multivariate Data
David S. MattesonDepartment of Statistical Science
Cornell University
[email protected]/~matteson
Joint work with: Nicholas A. James, ORIE, Cornell University
Sponsorship: National Science Foundation
2014 October
David S. Matteson ([email protected]) Change Point Analysis 2014 October 1 / 40
Introduction
Change Point Analysis
The process of detecting distributional changes within time ordered data
Framework:
I Retrospective, offline analysis
I Multivariate observations
I Estimation: number of change points and their positions
I Hierarchical algorithms
Applications:
I Genetics
I Finance
I Emergency Medical Services
David S. Matteson ([email protected]) Change Point Analysis 2014 October 2 / 40
Introduction
Change Point AnalysisGiven independent, time ordered observations X1,X2, . . . ,Xn ∈ Rd
Partition into k homogeneous, temporally contiguous subsets
I k is unknownI Size of each subset is unknown
David S. Matteson ([email protected]) Change Point Analysis 2014 October 3 / 40
Introduction
Change Point AnalysisGiven independent, time ordered observations X1,X2, . . . ,Xn ∈ Rd
Partition into k homogeneous, temporally contiguous subsets
I k is unknownI Size of each subset is unknown
David S. Matteson ([email protected]) Change Point Analysis 2014 October 3 / 40
Introduction
Change Point AnalysisGiven independent, time ordered observations X1,X2, . . . ,Xn ∈ Rd
Partition into k homogeneous, temporally contiguous subsets
I k is unknownI Size of each subset is unknown
David S. Matteson ([email protected]) Change Point Analysis 2014 October 3 / 40
Introduction
Change Point AnalysisGiven independent, time ordered observations X1,X2, . . . ,Xn ∈ Rd
Partition into k homogeneous, temporally contiguous subsets
I k is unknownI Size of each subset is unknown
David S. Matteson ([email protected]) Change Point Analysis 2014 October 3 / 40
Cluster Analysis
Cluster Analysis
Change point analysis is similar to cluster analysis
In cluster analysis we also wish to partition the observations intohomogeneous subsets
I Subsets may not be contiguous in time without some constraints
David S. Matteson ([email protected]) Change Point Analysis 2014 October 4 / 40
Cluster Analysis
Cluster Analysis
Change point analysis is similar to cluster analysis
In cluster analysis we also wish to partition the observations intohomogeneous subsets
I Subsets may not be contiguous in time without some constraints
David S. Matteson ([email protected]) Change Point Analysis 2014 October 4 / 40
Cluster Analysis
Cluster Analysis
Change point analysis is similar to cluster analysis
In cluster analysis we also wish to partition the observations intohomogeneous subsets
I Subsets may not be contiguous in time without some constraints
David S. Matteson ([email protected]) Change Point Analysis 2014 October 4 / 40
Hierarchical Estimation
Hierarchical Estimation
Apply methods from clustering to find change points
Exhaustive search is not practical: O(nk), in general.
May consider Dynamic Programming
We use a hierarchical or sequential approach: O(kn2)
I Divisive: Clusters are divided until each observation is its own cluster
I Agglomerative: Clusters are merged until all observations belong to asingle cluster
David S. Matteson ([email protected]) Change Point Analysis 2014 October 5 / 40
Hierarchical Estimation
Hierarchical Estimation
Apply methods from clustering to find change points
Exhaustive search is not practical: O(nk), in general.
May consider Dynamic Programming
We use a hierarchical or sequential approach: O(kn2)
I Divisive: Clusters are divided until each observation is its own cluster
I Agglomerative: Clusters are merged until all observations belong to asingle cluster
David S. Matteson ([email protected]) Change Point Analysis 2014 October 5 / 40
Hierarchical Estimation
Hierarchical Estimation
Apply methods from clustering to find change points
Exhaustive search is not practical: O(nk), in general.
May consider Dynamic Programming
We use a hierarchical or sequential approach: O(kn2)
I Divisive: Clusters are divided until each observation is its own cluster
I Agglomerative: Clusters are merged until all observations belong to asingle cluster
David S. Matteson ([email protected]) Change Point Analysis 2014 October 5 / 40
Hierarchical Estimation
Hierarchical Estimation: Divisive Progression
David S. Matteson ([email protected]) Change Point Analysis 2014 October 6 / 40
Hierarchical Estimation
Hierarchical Estimation: Divisive Progression
David S. Matteson ([email protected]) Change Point Analysis 2014 October 6 / 40
Hierarchical Estimation
Hierarchical Estimation: Divisive Progression
David S. Matteson ([email protected]) Change Point Analysis 2014 October 6 / 40
Hierarchical Estimation
Hierarchical Estimation: Agglomerative Progression
David S. Matteson ([email protected]) Change Point Analysis 2014 October 7 / 40
Hierarchical Estimation
Hierarchical Estimation: Agglomerative Progression
David S. Matteson ([email protected]) Change Point Analysis 2014 October 7 / 40
Hierarchical Estimation
Hierarchical Estimation: Agglomerative Progression
David S. Matteson ([email protected]) Change Point Analysis 2014 October 7 / 40
Hierarchical Estimation
Hierarchical Estimation: Agglomerative Progression
David S. Matteson ([email protected]) Change Point Analysis 2014 October 7 / 40
Multivariate Homogeneity
Measuring Multivariate Homogeneity
Suppose X,Y ∈ Rd with X ∼ Fx ⊥⊥ Y ∼ Fy
Let φx(t) = E(e i〈t,X〉) and φy (t) = E
(e i〈t,Y〉) characteristic functions
Define a divergence between Fx and Fy as
E(X,Y; w) =
∫Rd
|φx(t)− φy (t)|2 w(t) dt,
w(t) denotes an arbitrary positive weight function, for which E exists
David S. Matteson ([email protected]) Change Point Analysis 2014 October 8 / 40
Multivariate Homogeneity
A Weight Function
A convenient choice for w(t) > 0 (Szekely and Rizzo, 2005):
w(t;α) =
(2πd/2Γ(1− α/2)
α2αΓ((d + α)/2)|t|d+α
)−1
in which Γ(x) is the gamma function
Note: for any fixed (d , α), w(t;α) ∝ |t|−(d+α)
David S. Matteson ([email protected]) Change Point Analysis 2014 October 9 / 40
Multivariate Homogeneity
Equivalent Divergence MeasuresLet X and Y be independent, and (X′,Y′) be an iid copy of (X,Y)
Theorem
Suppose that E(|X|α + |Y|α) <∞, for some α ∈ (0, 2], then
E(X,Y;α) =
∫Rd
|φx(t)− φy (t)|2(
2πd/2Γ(1− α/2)
α2αΓ((d + α)/2)|t|d+α
)−1
dt
= 2E|X− Y|α − E|X− X′|α − E|Y − Y′|α
< ∞
I If 0 < α < 2 then E(X,Y;α) = 0 if and only if X and Y areidentically distributed
I If α = 2 then E(X,Y;α) = 0 if and only if EX = EY
David S. Matteson ([email protected]) Change Point Analysis 2014 October 10 / 40
Multivariate Homogeneity
An Empirical Measure (U-statistics)
Let Xn = {Xi : i = 1, . . . , n} and Ym = {Yj : j = 1, . . . ,m} beindependent iid samples from the distribution of X ,Y ∈ Rd , respectively,such that E |X |α,E |Y |α <∞ for some α ∈ (0, 2)
Define
E(Xn,Ym;α) =
2
mn
n∑i=1
m∑j=1
|Xi − Yj |α −(
n
2
)−1∑1≤i<k≤n
|Xi − Xk |α −(
m
2
)−1 ∑1≤j<k≤m
|Yj − Yk |α
and
Q(Xn,Ym;α) =mn
m + nE(Xn,Ym;α)
David S. Matteson ([email protected]) Change Point Analysis 2014 October 11 / 40
Multivariate Homogeneity
Known Location: Two-Sample Homogeneity TestBy strong law of large number for U-statistics Hoeffding (1961)
E(Xn,Ym;α)→ E(X ,Y ;α)
almost surely, as min(m, n)→∞.
Under the null hypothesis of equal distributions, i.e. E(X ,Y ;α) = 0,
Q(Xn,Ym;α)→ Q(X ,Y ;α) =∞∑i=1
λiQi
in distribution, as min(m, n)→∞. Here, the λi > 0 are constants thatdepend on α and the distributions of X and Y , and the Qi are iid χ2
1, seeRizzo and Szekely (2010).
Under alternative hypothesis of unequal distributions, i.e. E(X ,Y ;α) > 0,
Q(Xn,Ym;α)a.s.−→∞ as min(m, n)→∞.
David S. Matteson ([email protected]) Change Point Analysis 2014 October 12 / 40
Multivariate Homogeneity
Known Location: Two-Sample Homogeneity TestBy strong law of large number for U-statistics Hoeffding (1961)
E(Xn,Ym;α)→ E(X ,Y ;α)
almost surely, as min(m, n)→∞.
Under the null hypothesis of equal distributions, i.e. E(X ,Y ;α) = 0,
Q(Xn,Ym;α)→ Q(X ,Y ;α) =∞∑i=1
λiQi
in distribution, as min(m, n)→∞. Here, the λi > 0 are constants thatdepend on α and the distributions of X and Y , and the Qi are iid χ2
1, seeRizzo and Szekely (2010).
Under alternative hypothesis of unequal distributions, i.e. E(X ,Y ;α) > 0,
Q(Xn,Ym;α)a.s.−→∞ as min(m, n)→∞.
David S. Matteson ([email protected]) Change Point Analysis 2014 October 12 / 40
Multivariate Homogeneity
Known Location: Two-Sample Homogeneity TestBy strong law of large number for U-statistics Hoeffding (1961)
E(Xn,Ym;α)→ E(X ,Y ;α)
almost surely, as min(m, n)→∞.
Under the null hypothesis of equal distributions, i.e. E(X ,Y ;α) = 0,
Q(Xn,Ym;α)→ Q(X ,Y ;α) =∞∑i=1
λiQi
in distribution, as min(m, n)→∞. Here, the λi > 0 are constants thatdepend on α and the distributions of X and Y , and the Qi are iid χ2
1, seeRizzo and Szekely (2010).
Under alternative hypothesis of unequal distributions, i.e. E(X ,Y ;α) > 0,
Q(Xn,Ym;α)a.s.−→∞ as min(m, n)→∞.
David S. Matteson ([email protected]) Change Point Analysis 2014 October 12 / 40
Single Change Point
Single Change Point: Unknown LocationLet Z1, . . . ,ZT ∈ Rd be an independent sequence.
Suppose heterogeneous sample with observations from two distributions.
Let γ ∈ (0, 1) denote the division of observations, such thatZ1, . . . ,ZbγTc ∼ Fx and ZbγTc+1, . . . ,ZT ∼ Fy for every sample of size T .
Define Xτ = {Z1,Z2, . . . ,Zτ} and Yτ = {Zτ+1,Zτ+2, . . . ,ZT}.
A change point location τT is then estimated as
τT = argmaxτ
QT (Xτ ,Yτ ;α).
Theorem
If E(X ,Y ;α) <∞ and γ ∈ (0, 1), then
τT/Ta.s.−→ γ, as T →∞.
David S. Matteson ([email protected]) Change Point Analysis 2014 October 13 / 40
Single Change Point
Single Change Point: Unknown LocationLet Z1, . . . ,ZT ∈ Rd be an independent sequence.
Suppose heterogeneous sample with observations from two distributions.
Let γ ∈ (0, 1) denote the division of observations, such thatZ1, . . . ,ZbγTc ∼ Fx and ZbγTc+1, . . . ,ZT ∼ Fy for every sample of size T .
Define Xτ = {Z1,Z2, . . . ,Zτ} and Yτ = {Zτ+1,Zτ+2, . . . ,ZT}.
A change point location τT is then estimated as
τT = argmaxτ
QT (Xτ ,Yτ ;α).
Theorem
If E(X ,Y ;α) <∞ and γ ∈ (0, 1), then
τT/Ta.s.−→ γ, as T →∞.
David S. Matteson ([email protected]) Change Point Analysis 2014 October 13 / 40
Single Change Point
Single Change Point: Unknown LocationLet Z1, . . . ,ZT ∈ Rd be an independent sequence.
Suppose heterogeneous sample with observations from two distributions.
Let γ ∈ (0, 1) denote the division of observations, such thatZ1, . . . ,ZbγTc ∼ Fx and ZbγTc+1, . . . ,ZT ∼ Fy for every sample of size T .
Define Xτ = {Z1,Z2, . . . ,Zτ} and Yτ = {Zτ+1,Zτ+2, . . . ,ZT}.
A change point location τT is then estimated as
τT = argmaxτ
QT (Xτ ,Yτ ;α).
Theorem
If E(X ,Y ;α) <∞ and γ ∈ (0, 1), then
τT/Ta.s.−→ γ, as T →∞.
David S. Matteson ([email protected]) Change Point Analysis 2014 October 13 / 40
Multiple Change Points
Multiple Change Points: Unknown Locations
A generalized bisection approach for sequential estimation
For 1 ≤ τ < κ ≤ T , define:
Xτ = {Z1,Z2, . . . ,Zτ} and Yτ (κ) = {Zτ+1,Zτ+2, . . . ,Zκ}
A change point location τ is then estimated as
(τ , κ) = argmax(τ,κ)
Q(Xτ ,Yτ (κ);α).
David S. Matteson ([email protected]) Change Point Analysis 2014 October 14 / 40
Multiple Change Points
Sequentially Estimating Multiple Change PointsSuppose k − 1 change points have been estimated: τ1 < · · · < τk−1
This partitions the observations into k clusters C1, C2, . . . , Ck
Given these clusters, we then apply the single change point procedurewithin each of the k clusters.
For ith cluster Ci , denote proposed change point location τ(i),and the associated constant κ(i)
Now let i∗ = argmaxi∈{1,...,k}
Q[Xτ(i),Yτ(i)(κ(i));α],
in which Xτ(i) and Yτ(i)(κ(i)) are defined with respect to Ci
Denote test statistic as
qk = Q(Xτk ,Yτk (κk);α),
τk = τ(i∗) is kth estimated change point, located within cluster Ci∗
David S. Matteson ([email protected]) Change Point Analysis 2014 October 15 / 40
Multiple Change Points
Sequentially Estimating Multiple Change PointsSuppose k − 1 change points have been estimated: τ1 < · · · < τk−1
This partitions the observations into k clusters C1, C2, . . . , Ck
Given these clusters, we then apply the single change point procedurewithin each of the k clusters.
For ith cluster Ci , denote proposed change point location τ(i),and the associated constant κ(i)
Now let i∗ = argmaxi∈{1,...,k}
Q[Xτ(i),Yτ(i)(κ(i));α],
in which Xτ(i) and Yτ(i)(κ(i)) are defined with respect to Ci
Denote test statistic as
qk = Q(Xτk ,Yτk (κk);α),
τk = τ(i∗) is kth estimated change point, located within cluster Ci∗
David S. Matteson ([email protected]) Change Point Analysis 2014 October 15 / 40
Multiple Change Points
Sequentially Estimating Multiple Change PointsSuppose k − 1 change points have been estimated: τ1 < · · · < τk−1
This partitions the observations into k clusters C1, C2, . . . , Ck
Given these clusters, we then apply the single change point procedurewithin each of the k clusters.
For ith cluster Ci , denote proposed change point location τ(i),and the associated constant κ(i)
Now let i∗ = argmaxi∈{1,...,k}
Q[Xτ(i),Yτ(i)(κ(i));α],
in which Xτ(i) and Yτ(i)(κ(i)) are defined with respect to Ci
Denote test statistic as
qk = Q(Xτk ,Yτk (κk);α),
τk = τ(i∗) is kth estimated change point, located within cluster Ci∗
David S. Matteson ([email protected]) Change Point Analysis 2014 October 15 / 40
The E-Divisive Algorithm Estimation
The E-Divisive Algorithm: Estimating LocationAτ = {Z1,Z2, . . . ,Zτ} and Bτ (κ) = {Zτ+1,Zτ+2, . . . ,Zκ}
Recall, a change point location τ is estimated as
(τ , κ) = argmax(τ,κ)
Q(Aτ ,Bτ (κ);α)
Thus, we maximize mnn+m E(A,B;α) for all subsets A and B:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 16 / 40
The E-Divisive Algorithm Estimation
The E-Divisive Algorithm: Estimating LocationAτ = {Z1,Z2, . . . ,Zτ} and Bτ (κ) = {Zτ+1,Zτ+2, . . . ,Zκ}
Recall, a change point location τ is estimated as
(τ , κ) = argmax(τ,κ)
Q(Aτ ,Bτ (κ);α)
Thus, we maximize mnn+m E(A,B;α) for all subsets A and B:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 16 / 40
The E-Divisive Algorithm Estimation
The E-Divisive Algorithm: Estimating LocationAτ = {Z1,Z2, . . . ,Zτ} and Bτ (κ) = {Zτ+1,Zτ+2, . . . ,Zκ}
Recall, a change point location τ is estimated as
(τ , κ) = argmax(τ,κ)
Q(Aτ ,Bτ (κ);α)
Thus, we maximize mnn+m E(A,B;α) for all subsets A and B:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 16 / 40
The E-Divisive Algorithm Estimation
The E-Divisive Algorithm: Estimating LocationAτ = {Z1,Z2, . . . ,Zτ} and Bτ (κ) = {Zτ+1,Zτ+2, . . . ,Zκ}
Recall, a change point location τ is estimated as
(τ , κ) = argmax(τ,κ)
Q(Aτ ,Bτ (κ);α)
Thus, we maximize mnn+m E(A,B;α) for all subsets A and B:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 16 / 40
The E-Divisive Algorithm Estimation
The E-Divisive Algorithm: Estimating LocationAτ = {Z1,Z2, . . . ,Zτ} and Bτ (κ) = {Zτ+1,Zτ+2, . . . ,Zκ}
Recall, a change point location τ is estimated as
(τ , κ) = argmax(τ,κ)
Q(Aτ ,Bτ (κ);α)
Thus, we maximize mnn+m E(A,B;α) for all subsets A and B:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 16 / 40
The E-Divisive Algorithm Estimation
The E-Divisive Algorithm: Estimating LocationAτ = {Z1,Z2, . . . ,Zτ} and Bτ (κ) = {Zτ+1,Zτ+2, . . . ,Zκ}
Recall, a change point location τ is estimated as
(τ , κ) = argmax(τ,κ)
Q(Aτ ,Bτ (κ);α)
Thus, we maximize mnn+m E(A,B;α) for all subsets A and B:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 16 / 40
The E-Divisive Algorithm Estimation
The E-Divisive Algorithm: Estimating LocationAτ = {Z1,Z2, . . . ,Zτ} and Bτ (κ) = {Zτ+1,Zτ+2, . . . ,Zκ}
Recall, a change point location τ is estimated as
(τ , κ) = argmax(τ,κ)
Q(Aτ ,Bτ (κ);α)
Thus, we maximize mnn+m E(A,B;α) for all subsets A and B:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 16 / 40
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Inference via Permutation Test
Distribution of test statistic q∗ = Q(Aτ ,Bτ (κ);α)∣∣τ=τ
is unknown
Significance of proposed change point measured via permutation test
Randomly permute series, maximize mnn+m E(A,B;α), record and repeat:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 17 / 40
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Inference via Permutation TestDistribution of test statistic q∗ = Q(Aτ ,Bτ (κ);α)
∣∣τ=τ
is unknown
Significance of proposed change point measured via permutation test
Randomly permute series, maximize mnn+m E(A,B;α), record and repeat:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 17 / 40
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Inference via Permutation TestDistribution of test statistic q∗ = Q(Aτ ,Bτ (κ);α)
∣∣τ=τ
is unknown
Significance of proposed change point measured via permutation test
Randomly permute series, maximize mnn+m E(A,B;α), record and repeat:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 17 / 40
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Inference via Permutation TestDistribution of test statistic q∗ = Q(Aτ ,Bτ (κ);α)
∣∣τ=τ
is unknown
Significance of proposed change point measured via permutation test
Randomly permute series, maximize mnn+m E(A,B;α), record and repeat:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 17 / 40
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Inference via Permutation TestDistribution of test statistic q∗ = Q(Aτ ,Bτ (κ);α)
∣∣τ=τ
is unknown
Significance of proposed change point measured via permutation test
Randomly permute series, maximize mnn+m E(A,B;α), record and repeat:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 17 / 40
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Inference via Permutation TestDistribution of test statistic q∗ = Q(Aτ ,Bτ (κ);α)
∣∣τ=τ
is unknown
Significance of proposed change point measured via permutation test
Randomly permute series, maximize mnn+m E(A,B;α), record and repeat:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 17 / 40
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Inference via Permutation TestDistribution of test statistic q∗ = Q(Aτ ,Bτ (κ);α)
∣∣τ=τ
is unknown
Significance of proposed change point measured via permutation test
Randomly permute series, maximize mnn+m E(A,B;α), record and repeat:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 17 / 40
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Multiple Change Points
If q∗ = Q(Aτ ,Bτ (κ);α)∣∣τ=τ
is insignificant: STOP
If significant, condition on location, and repeat within clusters:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 18 / 40
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Multiple Change Points
If q∗ = Q(Aτ ,Bτ (κ);α)∣∣τ=τ
is insignificant: STOP
If significant, condition on location, and repeat within clusters:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 18 / 40
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Multiple Change Points
If q∗ = Q(Aτ ,Bτ (κ);α)∣∣τ=τ
is insignificant: STOP
If significant, condition on location, and repeat within clusters:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 18 / 40
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Multiple Change Points
If q∗ = Q(Aτ ,Bτ (κ);α)∣∣τ=τ
is insignificant: STOP
If significant, condition on location, and repeat within clusters:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 18 / 40
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Multiple Change Points
If q∗ = Q(Aτ ,Bτ (κ);α)∣∣τ=τ
is insignificant: STOP
If significant, condition on location, and repeat within clusters:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 18 / 40
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Multiple Change Points
If q∗ = Q(Aτ ,Bτ (κ);α)∣∣τ=τ
is insignificant: STOP
If significant, condition on location, and repeat within clusters:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 18 / 40
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Multiple Change Points
If q∗ = Q(Aτ ,Bτ (κ);α)∣∣τ=τ
is insignificant: STOP
If significant, condition on location, and repeat within clusters:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 18 / 40
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Multiple Change Points
If q∗ = Q(Aτ ,Bτ (κ);α)∣∣τ=τ
is insignificant: STOP
If significant, condition on location, and repeat within clusters:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 18 / 40
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Multiple Change Points
If q∗ = Q(Aτ ,Bτ (κ);α)∣∣τ=τ
is insignificant: STOP
If significant, condition on location, and repeat within clusters:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 18 / 40
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Multiple Change Points
If q∗ = Q(Aτ ,Bτ (κ);α)∣∣τ=τ
is insignificant: STOP
If significant, condition on location, and repeat within clusters:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 18 / 40
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Multiple Change Points
If q∗ = Q(Aτ ,Bτ (κ);α)∣∣τ=τ
is insignificant: STOP
If significant, condition on location, and repeat within clusters:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 18 / 40
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Multiple Change Points
If q∗ = Q(Aτ ,Bτ (κ);α)∣∣τ=τ
is insignificant: STOP
If significant, condition on location, and repeat within clusters:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 18 / 40
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Multiple Change Points
Once again, perform permutation test
However, only permute within each cluster:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 19 / 40
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Multiple Change PointsOnce again, perform permutation test
However, only permute within each cluster:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 19 / 40
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Multiple Change PointsOnce again, perform permutation test
However, only permute within each cluster:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 19 / 40
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Multiple Change PointsOnce again, perform permutation test
However, only permute within each cluster:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 19 / 40
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Multiple Change PointsOnce again, perform permutation test
However, only permute within each cluster:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 19 / 40
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Multiple Change PointsOnce again, perform permutation test
However, only permute within each cluster:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 19 / 40
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Multiple Change PointsOnce again, perform permutation test
However, only permute within each cluster:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 19 / 40
The E-Divisive Algorithm Inference
The E-Divisive Algorithm: Multiple Change PointsOnce again, perform permutation test
However, only permute within each cluster:
David S. Matteson ([email protected]) Change Point Analysis 2014 October 19 / 40
The E-Divisive Algorithm ecp Package
The ‘ecp’ R package (CRAN)Signature:
e.divisive(X, sig.lvl=0.05, R=199, k=NULL, min.size=30, alpha=1)
Arguments:
I X - A T × d matrix representation of a length T time series, withd-dimensional observations.
I sig.lvl - The significance level used for the permutation test.
I R - The maximum number of permutations to perform in thepermutation test.
I k - The number of change points to return. If this is NULL only thestatistically significant estimated change points are returned.
I min.size - The minimum number of observations btw change points.
I alpha - The index for test statistic.
David S. Matteson ([email protected]) Change Point Analysis 2014 October 20 / 40
The E-Divisive Algorithm ecp Package
The ‘ecp’ R package (CRAN)Returned list:
I k.hat - Number of clusters created by the estimated change points.
I order.found - The order in which the change points were estimated.
I estimates - Locations of the statistically significant change points.
I considered.last - Location of the last change point, that was notfound to be statistically significant at the given significance level.
I permutations - The number of permutations performed by each ofthe sequential permutation test.
I cluster - The estimated cluster membership vector.
I p.values - Approximate p-values estimated from each permutationtest.
Complexity is O(kT 2)David S. Matteson ([email protected]) Change Point Analysis 2014 October 21 / 40
Simulation
Simulation Study: Rand IndexCompare E-Divisive with a generalized Wilcoxon/MannWhitney approach:the MultiRank procedure Lung-Yut-Fong et al. (2011)
For two partitions U & V , the Rand Index considers all pairs ofobservations:
Define
{A} Pairs in same cluster under U and in same cluster under V
{B} Pairs in different cluster under U and in different cluster under V
Rand index =#A + #B(T
2
)An equivalent definition of the Rand index can be found in Hubert andArabie (1985)
Adjusted Rand =Index− Expected Index
Max Index− Expected Index=
Rand− Expected Rand
1− Expected Rand
David S. Matteson ([email protected]) Change Point Analysis 2014 October 22 / 40
Simulation
Simulation Study: Rand IndexCompare E-Divisive with a generalized Wilcoxon/MannWhitney approach:the MultiRank procedure Lung-Yut-Fong et al. (2011)
For two partitions U & V , the Rand Index considers all pairs ofobservations:
Define
{A} Pairs in same cluster under U and in same cluster under V
{B} Pairs in different cluster under U and in different cluster under V
Rand index =#A + #B(T
2
)An equivalent definition of the Rand index can be found in Hubert andArabie (1985)
Adjusted Rand =Index− Expected Index
Max Index− Expected Index=
Rand− Expected Rand
1− Expected Rand
David S. Matteson ([email protected]) Change Point Analysis 2014 October 22 / 40
Simulation
Simulation Study: Rand IndexCompare E-Divisive with a generalized Wilcoxon/MannWhitney approach:the MultiRank procedure Lung-Yut-Fong et al. (2011)
For two partitions U & V , the Rand Index considers all pairs ofobservations:
Define
{A} Pairs in same cluster under U and in same cluster under V
{B} Pairs in different cluster under U and in different cluster under V
Rand index =#A + #B(T
2
)An equivalent definition of the Rand index can be found in Hubert andArabie (1985)
Adjusted Rand =Index− Expected Index
Max Index− Expected Index=
Rand− Expected Rand
1− Expected Rand
David S. Matteson ([email protected]) Change Point Analysis 2014 October 22 / 40
Simulation
Simulation Study: Rand IndexCompare E-Divisive with a generalized Wilcoxon/MannWhitney approach:the MultiRank procedure Lung-Yut-Fong et al. (2011)
For two partitions U & V , the Rand Index considers all pairs ofobservations:
Define
{A} Pairs in same cluster under U and in same cluster under V
{B} Pairs in different cluster under U and in different cluster under V
Rand index =#A + #B(T
2
)An equivalent definition of the Rand index can be found in Hubert andArabie (1985)
Adjusted Rand =Index− Expected Index
Max Index− Expected Index=
Rand− Expected Rand
1− Expected Rand
David S. Matteson ([email protected]) Change Point Analysis 2014 October 22 / 40
Simulation
Simulation Study: Rand IndexCompare E-Divisive with a generalized Wilcoxon/MannWhitney approach:the MultiRank procedure Lung-Yut-Fong et al. (2011)
For two partitions U & V , the Rand Index considers all pairs ofobservations:
Define
{A} Pairs in same cluster under U and in same cluster under V
{B} Pairs in different cluster under U and in different cluster under V
Rand index =#A + #B(T
2
)An equivalent definition of the Rand index can be found in Hubert andArabie (1985)
Adjusted Rand =Index− Expected Index
Max Index− Expected Index=
Rand− Expected Rand
1− Expected Rand
David S. Matteson ([email protected]) Change Point Analysis 2014 October 22 / 40
Simulation
Simulation Study: Rand IndexCompare E-Divisive with a generalized Wilcoxon/MannWhitney approach:the MultiRank procedure Lung-Yut-Fong et al. (2011)
For two partitions U & V , the Rand Index considers all pairs ofobservations:
Define
{A} Pairs in same cluster under U and in same cluster under V
{B} Pairs in different cluster under U and in different cluster under V
Rand index =#A + #B(T
2
)An equivalent definition of the Rand index can be found in Hubert andArabie (1985)
Adjusted Rand =Index− Expected Index
Max Index− Expected Index=
Rand− Expected Rand
1− Expected Rand
David S. Matteson ([email protected]) Change Point Analysis 2014 October 22 / 40
Simulation
A change in variance for univariate normal data
Method Correct k Average Adjusted Rand
MultiRank 22/100 0.504
E-Divisive 95/100 0.909
David S. Matteson ([email protected]) Change Point Analysis 2014 October 23 / 40
Simulation
A change in correlation for bivariate normal data
Method Correct k Average Adjused Rand
MultiRank 72/100 0.166
E-Divisive 92/100 0.997
David S. Matteson ([email protected]) Change Point Analysis 2014 October 24 / 40
Simulation
1,000 simulations, 2 CP: N(0,1), N(µ,1), N(0,1)
Average Rand Average Adj. RandT µ MultiRank E-Divisive MultiRank E-Divisive
1501 0.940 0.948 0.867 0.8852 0.977 0.991 0.949 0.9814 0.981 1.000 0.958 1.000
3001 0.970 0.972 0.933 0.9372 0.989 0.996 0.975 0.9914 0.991 1.000 0.979 1.000
6001 0.986 0.986 0.968 0.9692 0.994 0.998 0.987 0.9964 0.995 1.000 0.990 1.000
David S. Matteson ([email protected]) Change Point Analysis 2014 October 25 / 40
Simulation
1,000 simulations, 2 CP: N(0,1), N(0, σ2), N(0,1)
Average Rand Average Adj. RandT σ2 MultiRank E-Divisive MultiRank E-Divisive
1502 0.731 0.902 0.471 0.7855 0.764 0.976 0.521 0.948
10 0.764 0.989 0.519 0.975
3002 0.744 0.924 0.490 0.8345 0.759 0.990 0.511 0.978
10 0.759 0.995 0.512 0.989
6002 0.742 0.970 0.488 0.9335 0.753 0.996 0.500 0.990
10 0.753 0.998 0.501 0.995
David S. Matteson ([email protected]) Change Point Analysis 2014 October 26 / 40
Simulation
1,000 simulations, 2 CP: N(0,1), tν(0, 1), N(0,1)
Average Rand Average Adj. RandT ν MultiRank E-Divisive MultiRank E-Divisive
15016 0.632 0.798 0.327 0.5648 0.651 0.830 0.353 0.6312 0.679 0.846 0.395 0.666
30016 0.640 0.755 0.341 0.4928 0.639 0.769 0.338 0.5222 0.680 0.809 0.396 0.596
60016 0.655 0.735 0.365 0.4698 0.653 0.727 0.359 0.4582 0.697 0.813 0.420 0.608
David S. Matteson ([email protected]) Change Point Analysis 2014 October 27 / 40
Simulation
1,000 simulations, 2 CP: N2(0, I ),N2(µ, I ),N2(0, I )
Average Rand Average Adj. RandT µ MultiRank E-Divisive MultiRank E-Divisive
3001 0.656 0.698 0.363 0.4062 0.713 0.732 0.446 0.4683 0.743 0.778 0.489 0.549
6001 0.991 0.994 0.981 0.9872 0.995 1.000 0.989 0.9993 0.996 1.000 0.990 1.000
9001 0.994 0.996 0.987 0.9912 0.997 1.000 0.993 0.9993 0.997 1.000 0.993 1.000
David S. Matteson ([email protected]) Change Point Analysis 2014 October 28 / 40
Simulation
1,000 simulations, 2 CP: N2(0,Σ),N2(0, I ),N2(0,Σ)
Σ =
(1 ρρ 1
)Average Rand Average Adj. Rand
T ρ MultiRank E-Divisive MultiRank E-Divisive
3000.5 0.663 0.729 0.373 0.4550.7 0.712 0.728 0.444 0.4620.9 0.745 0.743 0.491 0.488
6000.5 0.674 0.676 0.391 0.3860.7 0.724 0.672 0.462 0.3700.9 0.745 0.834 0.492 0.673
9000.5 0.692 0.635 0.415 0.3220.7 0.724 0.678 0.464 0.3980.9 0.747 0.966 0.494 0.928
David S. Matteson ([email protected]) Change Point Analysis 2014 October 29 / 40
Simulation
1,000 simulations, 2 CP: Nd(0,Σ),Nd(0, I ),Nd(0,Σ)
Σw.o./noise =
0BBBBB@1 ρ ρ · · · ρρ 1 ρ · · · ρρ ρ 1 · · · ρ...
......
. . ....
ρ ρ ρ · · · 1
1CCCCCA Σw/noise =
0BBBBB@1 ρ 0 · · · 0ρ 1 0 · · · 00 0 1 · · · 0...
......
. . ....
0 0 0 · · · 1
1CCCCCAWithout Noise With Noise
T d Avg. Rand Avg. Adj. Rand Avg. Rand Avg. Adj. Rand
3002 0.767 0.522 0.774 0.5435 0.912 0.816 0.736 0.4639 0.970 0.935 0.736 0.459
6002 0.817 0.648 0.836 0.8165 0.993 0.984 0.631 0.6269 0.998 0.995 0.666 0.648
9002 0.970 0.937 0.968 0.9335 0.998 0.996 0.644 0.3429 0.999 0.999 0.612 0.284
David S. Matteson ([email protected]) Change Point Analysis 2014 October 30 / 40
Applications Genetics
Genetics DataWe applied E-divisive to the aCGH mico-array dataset of 43 individuals with abladder tumor (Bleakley and Vert, 2011); relative hybridization intensity profile forone individual.MultiRank (Lung-Yut-Fong et al., 2011) k = 17 adjRand = 0.677KCPA (Arlot et al., 2012) k = 41 adjRand = 0.658PELT (Killick et al., 2012) k = 47 adjRand = 0.853
0 500 1000 1500 2000
−0.5
0.51.5
MultiRank
Index
Sign
al
0 500 1000 1500 2000
−0.5
0.51.5
KCPA
Index
Sign
al
0 500 1000 1500 2000
−0.5
0.51.5
PELT
Index
Sign
al
0 500 1000 1500 2000
−0.5
0.51.5
E−Divisive
Index
Sign
al
Figure: Top: MultiRank. Bottom: E-divisiveDavid S. Matteson ([email protected]) Change Point Analysis 2014 October 31 / 40
Applications Finance
Financial Data: Cisco Systems
The E-divisive procedure was applied to the monthly log returns of theDow 30
Marginal analysis of Cisco Systems Inc. from April 1990 to January 2010.The procedure found change points at April 2000 and October 2002.
David S. Matteson ([email protected]) Change Point Analysis 2014 October 32 / 40
Applications Finance
Financial Data: Cisco SystemsMarginal analysis of Cisco Systems Inc. from April 1990 to January 2010.The procedure found change points at April 2000 and October 2002.
David S. Matteson ([email protected]) Change Point Analysis 2014 October 33 / 40
Applications Finance
Financial Data: S&P 500 Index
S&P 500: May 20, 1999 − April 25, 2011
Date
log
retu
rns
2000 2002 2004 2006 2008 2010
−0.
10−
0.05
0.00
0.05
0.10
David S. Matteson ([email protected]) Change Point Analysis 2014 October 34 / 40
Agglomerative Algorithm
An Agglomerative Algorithm
Given a partition of k clusters C = {C1,C2, . . . ,Ck}, clusters may or maynot be single observations
Consider combining a pair of adjacent clusters
The partition that maximizes the goodness-of-fit statistic determineschange point locations
David S. Matteson ([email protected]) Change Point Analysis 2014 October 35 / 40
Agglomerative Algorithm
An Agglomerative Algorithm: Goodness-of-Fit
Goodness-of-fit statistic S(k): sum the E-distances between adjacentclusters
Given clusters C = {C1,C2, . . . ,Ck} with ni = #Ci , define
S(k) =k−1∑i=1
(nini+1
ni + ni+1
)Eαni ,ni+1
(Ci ,Ci+1),
David S. Matteson ([email protected]) Change Point Analysis 2014 October 36 / 40
Agglomerative Algorithm
An Agglomerative Algorithm
The partitioning which maximized S(k) is then used to estimate changepoint locations.
Figure: Progression of the goodness of fit statistic, and where it is maximized.
David S. Matteson ([email protected]) Change Point Analysis 2014 October 37 / 40
Agglomerative Algorithm Application: EMS
EMS Priority One Response for Toronto 2007
David S. Matteson ([email protected]) Change Point Analysis 2014 October 38 / 40
Agglomerative Algorithm Application: EMS
EMS Priority One Response for Toronto 2007
David S. Matteson ([email protected]) Change Point Analysis 2014 October 39 / 40
Bibliography
Bibliographyhttp://www.stat.cornell.edu/∼matteson/
Bleakley, K., and Vert, J.-P. (2011), The group fused Lasso for multiple change-pointdetection,, Technical Report HAL-00602121, Bioinformatics Center (CBIO).
Hoeffding, W. (1961), The Strong Law of Large Numbers for U-Statistics,, TechnicalReport 302, North Carolina State University. Dept. of Statistics.
Hubert, L., and Arabie, P. (1985), “Comparing Partitions,” Journal of Classification,2(1), 193 – 218.
James, N. A., and Matteson, D. S. (2013), “ecp: An R Package for NonparametricMultiple Change Point Analysis of Multivariate Data,” arXiv:1309.3295, .
Lung-Yut-Fong, A., Levy-Leduc, C., and Cappe, O. (2011), “Homogeneity andchange-point detection tests for multivariate data using rank statistics,”.
Matteson, D. S., and James, N. A. (2013), “A Nonparametric Approach for MultipleChange Point Analysis of Multivariate Data,” Journal of the American StatisticalAssociation, To Appear.
Rizzo, M. L., and Szekely, G. J. (2010), “Disco Analysis: A Nonparametric Extension ofAnalysis of Variance,” The Annals of Applied Statistics, 4(2), 1034–1055.
Szekely, G. J., and Rizzo, M. L. (2005), “Hierarchical Clustering via JointBetween-Within Distances: Extending Ward’s Minimum Variance Method,” Journal ofClassification, 22(2), 151 – 183.
David S. Matteson ([email protected]) Change Point Analysis 2014 October 40 / 40