A Fully Resolved Consensus Between Fully Resolved Phylogenetic Trees

Post on 29-Jan-2016

46 views 0 download

Tags:

description

A Fully Resolved Consensus Between Fully Resolved Phylogenetic Trees. Jos é Augusto Amgarten Quitzau João Meidanis Scylla Bioinformatics, Brazil University of Campinas, Brazil. Phylogeny reconstruction methods. - PowerPoint PPT Presentation

Transcript of A Fully Resolved Consensus Between Fully Resolved Phylogenetic Trees

A Fully Resolved Consensus Between Fully Resolved

Phylogenetic Trees

José Augusto Amgarten QuitzauJoão Meidanis

Scylla Bioinformatics, BrazilUniversity of Campinas, Brazil

Phylogeny reconstruction methods

Phylogeny reconstruction methods aim at inferring the phylogenetic tree that best describes the evolutionary history for a set of taxa.

Which tree to choose?

“The field of systematics has been in considerable turmoil as various investigators developed different methods of classification and argued their merits. I guarantee you that no one method or view has all the good points.”

Walter M. Fitch – 1984

Consensus as tree constructor

Consensus trees have been used traditionally in tree comparison and calculation of bootstrap values

We propose the use of consensus as a tree constructor

It can be efficiently implemented as long as we keep trees fully resolved

Every edge in a phylogenetic tree divides the leaves in two subgroupssubgroups.

Each of these pairs of subgroups are splitssplits of the tree.

EF

G

H

AB

CD

Splits

Tree weight

Our method relies on weighingweighing trees and taking the one with maximum weight

Let the frequencyfrequency of a split in a collection of trees be the number of trees which contain the split divided by the total number of trees in the collection

Let the weightweight of an unrooted phylogenetic tree be the product of its splits frequencies

Most probable tree

A most probable treemost probable tree for a collection of fully resolved phylogenetic trees is a tree that maximizes the weight:

Example

Solution

w = 0.0703125

Running time

The tree weight formula can be written as a product of the frequencies of the small subgroups

We designed an algorithm that finds all most probable trees for a given set of fully resolved phylogenetic trees

The complexity of the algorithm is O(l3t2log(lt)),where l is the number of leaves and t is the number of trees

Experiments

Data setsData sets used to test the new method:

Synthetic data: from Gascuel’s LIRMM site

K2P – Kimura 2 Parameter, no MC

K2Pm – Kimura 2 Parameter, with MC

COV – Covarion model, no MC

COVm – Covarion model, with MC

Real data: Ribosomal RNA

Experiments

ProgramsPrograms used to test the new method (19):Software Method Model

fastMe Minimum evolution JC, K2P

Mega Minimum evolution JC, K2P, TN

Mega Maximum parsimony

Mega Neighbor joining JC, K2P, TN

dnacomp DNA compatibility

dnaml Maximum likelihood

dnapars Maximum parsimony

neighbor Neighbor joining JC, K2P

neighbor UPGMA JC, K2P

weighbor Weighted neighbor joining JC, K2P

Most probable = Median

Reflects general tendency

Results: average split distance

Data set Minimum Distance

K2P 43.44

K2Pm 77.78

COV 52.67

COVm 69.11

Ribosomal 60.71

Consensus consistently yields minimum average split distance

May result in better tree

Results: distance to “real” tree

Data set Consensus Not Worse Than ...

K2P 72 %

K2Pm 39 %

COV 78 %

COVm 72 %

Ribosomal 100 %

Consensus consistently not worse off than majority of input trees

… of input trees

Theoretical foundations

AB

CD

EF

G

H

All splits of a tree

AB

CD

EF

G

H AA | BCDEFGH| BCDEFGHBB | ACDEFGH| ACDEFGH

ABAB | CDEFGH| CDEFGH

CC | ABDEFGH| ABDEFGHDD | ABCEFGH| ABCEFGH

HH | ABCDEFG| ABCDEFG

GG | ABCDEFH| ABCDEFH

FF | ABCDEGH| ABCDEGHEE | ABCDFGH| ABCDFGH

CDCD | ABEFGH| ABEFGH

EFEF | ABCDGH| ABCDGH

EFGEFG | ABCDH| ABCDH

ABCDABCD | EFGH| EFGH

Small subgroup of each split

AB

CD

EF

G

H AA | BCDEFGH

BB | ACDEFGH

ABAB | CDEFGH

CC | ABDEFGH

DD | ABCEFGH

HH | ABCDEFG

GG | ABCDEFH

FF | ABCDEGH

EE | ABCDFGH

CDCD | ABEFGH

EFEF | ABCDGH

EFGEFG | ABCDH

ABCDABCD | EFGH

Small subgroups

AABB

ABAB

CCDD

HH

GG

FFEE

CDCD

EFEF

EFGEFG

ABCDABCD

Maximal clusters (n-trees)

AABB

ABAB

CCDD

HH

GG

FFEE

CDCD

EFEF

EFGEFG

ABCDABCD

Fundamental theoretical result

AA BBABAB

CC DDHH

GGFFEE

CDCD

EFEFEFGEFG

ABCDABCD

● The small subgroup set of a phylogenetic tree is always a finite set of n-treesn-trees

● There are exactly three n-trees in this set, and all n-trees are maximal if and only if the phylogenetic tree is fully resolved

Implementation details

DD EE FF GG EFEF GHGH ABCABC

Dynamic programming

DD EE FF GG EFEF GHGH ABCABC

Dynamic programming

DD EE FF GG EFEF GHGH ABCABC

Dynamic programming

DD EE FF GG EFEF GHGH ABCABC

Implementation details

DD EE FF GG EFEF GHGH

FGHFGHDEFDEFABCABCDD EE DEDE

L \

ABCABC

Implementation details

To Do List

Rooted trees

Polytomies

Non uniform weights for input trees

Acknowledgments

Scylla Bioinformatics and Institute of Computing, Unicamp, for machine time, infrastructure, and support

Brazilian Research Financing Agency CNPq, grant 470420/2004-9