Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan...

42
Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown

Transcript of Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan...

Page 1: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.

Effects of Rooting on Phylogenic Algorithms

Margareta Ackerman

Joint work with

David Loker and Dan Brown

Page 2: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.

Hierarchical Clustering & Phylogency

Page 3: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.

Phylogeny is an application of Hierarchical Clustering.

They are closely related!

Phylogeny meets Hierarchical Clustering

Unfortunately, there is a

disconnect between

these fields.

Page 4: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.

A step towards bridging the gap:

We bring techniques from cluster analysis to study Phylogenetic algorithms.

We apply a recent framework for clustering algorithm selection to Phylogeny

[(Ackerman, Ben-David, and Loker, ‘10), (Ackerman, Ben-David, and Loker, ‘10), (Ackerman & Ben-David, IJCAI ‘11), (Zedah and Ben-David, ‘09)]

Bridging the Gap

Page 5: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.

Given the same input, different Phylogenetic algorithms can produce radically different results.

5

How should a user decide which algorithm to use?

Selecting Phylogenetic Algorithms

Page 6: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.

This framework lets a user utilize prior knowledge to select an algorithm

• Identify properties that distinguish between different input-output behaviour of clustering paradigms

• The properties should be:1) Intuitive and “user-friendly”2) Useful for distinguishing clustering

algorithms

6

Framework for Selecting Phylogenetic Algorithms

Page 7: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.

• Rooting Phylogenetic Trees• Formal Framework• Properties of Hierarchical Algorithms • Analysis of Linkage-Based Algorithms• Analysis of Neighbor Joining • Conclusions and Future Direction

Outline

Page 8: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.

A common solution:

Introduce distant taxa (or, elements) and root where the distant taxa connect with the ingroup.

How to Root Phylogenetic Trees?

E

Page 9: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.

The addition of an outgroup can CHANGE the topology of the ingroup.

When Rooting Changes the Ingroup

After adding outgroup E

Page 10: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.

Empirical studies demonstrate that when using some algorithms, ingroup topology can be disrupted when an outgroup is added [(Holland et. al., ‘03), (Shavit et. al., ‘07), (Lin et. al, ‘02), (Slack et. al., ‘03) ]

We perform a theoretical analysis of this phenomenon, proving that some algorithms are immune to this problem, while others are highly volatile.

This Happens in Practice!

Page 11: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.

Independently of our work, it was shown that when using BME, the ingroup topology can change arbitrarily when an outlier is added (Cueto and Matsen, 2010)

Previous Work

Page 12: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.

• Linkage-based algorithms (including UPGMA) do not change ingroup when the outgroup is sufficiently far away

• Using Neighbor Joining, ingroup topology is effected by outgroups even if the outgroup is arbitrarily far away

Our Contributions

Page 13: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.

• Rooting Phylogenetic Trees• Formal Framework• Properties of Hierarchical Algorithms • Analysis of Linkage-Based Algorithms• Analysis of Neighbor Joining • Conclusions and Future Direction

Outline

Page 14: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.

C_i is a cluster in a dendrogram D if there exists a node in the dendrogram so that C_i is the set of its leaf descendents.

14

Formal Setup

Page 15: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.

C = {C1, … , Ck} is a clustering in a dendrogram D if

– Ci is a cluster in D for all 1≤ i ≤ k, and

– Clusters are disjoint 15

Formal Setup

Page 16: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.

A Hierarchical Clustering Algorithm A maps

Input: A data set X with a distance function d, denoted (X,d)

toOutput: A dendrogram of X

The distance between Y X ⊆ and Z X ⊆ is the length of the minimum edge between them

d(Y,Z) = miny in Y, z in Z d(y,z)16

Formal Setup

Page 17: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.

• Rooting Phylogenetic Trees• Formal Framework• Properties of Hierarchical Algorithms • Analysis of Linkage-Based Algorithms• Analysis of Neighbor Joining • Conclusions and Future Direction

Outline

Page 18: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.

Given a data set (XuO, d) and algorithm A,

X is unaffected by O

if A(X, d) is a sub-dendrogram of A(XuO, d).

Otherwise, X is affected by O.

A(X,d) A(O,d) A(XuO,d)

Unaffected by an Outgroup

Page 19: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.

Ingroup

Algorithm A is outgroup-independent if for any data sets (X, d) and (O, d’), if (X,d) and (O,d’) are sufficiently far apart then X is unaffected by O.

Outgroup

Outgroup Independence

Page 20: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.

Algorithm A is outgroup-independent if for any data sets (X, d) and (O, d’), if (X,d) and (O,d’) are sufficiently far apart then X is unaffected by O.

A(X,d) A(O,d’) A(XuO,d*)

d* puts (X,d) and (O,d’) sufficiently far apart

Outgroup Independence

Page 21: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.

An algorithm A is outgroup volatile if for any data set (X,d) and any constant c, there exist (O,d’) with distance between X and O at least c, such that X is affected by O.

If O is a singleton, then A is outlier volatile.

Outgroup Volatility

Page 22: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.

• Rooting Phylogenetic Trees• Formal Framework• Properties of Hierarchical Algorithms • Analysis of Linkage-Based Algorithms• Analysis of Neighbor Joining • Conclusions and Future Direction

Outline

Page 23: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.

Theorem: Any hierarchical algorithm A that is 2-rich, outer-consistent, and local, is outgroup independent.

We use the following general result to show that Linkage-Based algorithms are outgroup-independent.

Page 24: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.

If we select a cluster from the dendrogram, and run the algorithm the data underlying this cluster, we obtain a result that is consistent with the original dendrogram.

D = A(X,d) D’ = A(X’,d)X’={x1, …, x4}

24

Locality

Page 25: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.

A(X,d)

C

C on dataset (X,d)C on dataset (X,d’)

Outer-consistent change

25

If A is outer-consistent, then A(X,d’) will also include the clustering C.

Outer Consistency

Page 26: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.

Given any pair of data sets (X, d) and

(X’, d’), there exists d* over XuX’, so that X and X’ are the children of the root in A(XuX’, d*).

2-Richness

(X,d) (X, d’)

(X, d*)

X

A(X uO,d*)

X’

Page 27: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.

Proof: We want to show that given any

if the data sets are placed sufficiently far apart,

then A(X,d) is a sub-dendrogram of A(XuO, d*).

Theorem: Any hierarchical algorithm A that is 2-rich, outer-consistent, and local, is outgroup independent.

(X,d) (O, d’)

(X uO,d’’)

A(X,d)

A(X uO,d*)

Page 28: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.

Proof: First, apply 2-richness. Given

there exists d’’ over X uO,

so that X and O are children of A(X uO,d’’).

Theorem: Any hierarchical algorithm A that is 2-rich, outer-consistent, and local, is outgroup independent.

(X,d) (O, d’)

(X uO,d’’)

X

A(X uO,d’’)

O

c

Page 29: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.

Proof:

Let d* be any distance function extending d and d’ where the min distance between X and O is at least c.

Then by outer-consistency, X and O are children of the root of A(X uO,d*).

Theorem: Any hierarchical algorithm A that is 2-rich, outer-consistent, and local, is outgroup independent.

(X uO,d’’)

X

A(X uO,d*)

O

c

(X uO,d*)

Page 30: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.

Proof:

Finally, by locality, A(X,d) is a sub-dendrogram of A(X uO,d*).

Therefore, whenever (X,d) and (O,d’) are sufficiently far apart, X is unaffected by O.

Theorem: Any hierarchical algorithm A that is 2-rich, outer-consistent, and local, is outgroup independent.

X

A(X uO,d*)

O

A(X,d)

A(X uO,d*)

Page 31: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.

• Create a leaf node for every element of X

Insert image

31

Linkage Based Algorithm

Page 32: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.

• Create a leaf node for every element of X

• Repeat the following until a single tree remains:– Consider clusters represented by the remaining root

nodes.

32

Linkage Based Algorithm

Page 33: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.

• Create a leaf node for every elements of X

• Repeat the following until a single tree remains:– Consider clusters represented by the remaining root

nodes. Merge the closest pair of clusters by assigning them a common parent node.

33

?

Linkage Based Algorithm

Page 34: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.

• The choice of linkage function distinguishes between different linkage-based algorithms.

• Examples of common linkage-functions– UPGMA: average between-cluster distance– Single-linkage: shortest between-cluster distance– Complete-linkage: maximum between-cluster

distanceX1 X2

34

Examples of Linkage Based Algorithms

Page 35: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.

Proof:

We can show that all linkage-based algorithms are 2-outer-rich, outer-consistent, and local.

Result follows by previous Theorem.

Theorem:All Linkage-Based algorithms are outgroup independent.

Page 36: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.

• Rooting Phylogenetic Trees• Formal Framework• Properties of Hierarchical Algorithms • Analysis of Linkage-Based Algorithms• Analysis of Neighbor Joining • Conclusions and Future Direction

Outline

Page 37: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.

Most widely-used distance-based method for phylogenetic reconstruction

Works well in practice If there is a tree that fits the distance

matrix (additive), it will find it

Neighbour Joining

Page 38: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.

This remains the case when distances of the ingroup are additive.

Theorem: Neighbor joining is outlier volatile.

Page 39: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.

Theorem: Given any data set (X,d), there exists a set of outliers O and a distance function d∗ over X O ∪ extending d, where d∗(X,O) can be arbitrarily large, such that NJ(X O, d∪ ∗)|X is an arbitrary dendrogram.

Outgroups can lead to arbitrary dendrograms

A(X,d) A(X uO,d*)|X

Page 40: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.

• Rooting Phylogenetic Trees• Formal Framework• Properties of Hierarchical Algorithms • Analysis of Linkage-Based Algorithms• Analysis of Neighbor Joining • Conclusions and Future Direction

Outline

Page 41: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.

• Present a formal framework for the analysis of the effects of outgroups on the ingroup topology for computationally efficiently hierarchical algorithms

• Prove that all Linkage-Based algorithms, which include UPGMA, are outgroup independent

• Prove that NJ is outgroup volatile • This only addresses rooting - We do not claim

that UPGMA is in general better than NJ.

Conclusions

Page 42: Effects of Rooting on Phylogenic Algorithms Margareta Ackerman Joint work with David Loker and Dan Brown.

• How to choose outgroups for rooting NJ?

• Perform a similar analysis of Likelihood methods

Future Work