A Fully Resolved Consensus Between Fully Resolved Phylogenetic Trees

A Fully Resolved Consensus Between Fully Resolved

Phylogenetic Trees

José Augusto Amgarten QuitzauJoão Meidanis

Scylla Bioinformatics, BrazilUniversity of Campinas, Brazil

Phylogeny reconstruction methods

Phylogeny reconstruction methods aim at inferring the phylogenetic tree that best describes the evolutionary history for a set of taxa.

Which tree to choose?

“The field of systematics has been in considerable turmoil as various investigators developed different methods of classification and argued their merits. I guarantee you that no one method or view has all the good points.”

Walter M. Fitch – 1984

Consensus as tree constructor

Consensus trees have been used traditionally in tree comparison and calculation of bootstrap values

We propose the use of consensus as a tree constructor

It can be efficiently implemented as long as we keep trees fully resolved

Every edge in a phylogenetic tree divides the leaves in two subgroupssubgroups.

Each of these pairs of subgroups are splitssplits of the tree.

Splits

Tree weight

Our method relies on weighingweighing trees and taking the one with maximum weight

Let the frequencyfrequency of a split in a collection of trees be the number of trees which contain the split divided by the total number of trees in the collection

Let the weightweight of an unrooted phylogenetic tree be the product of its splits frequencies

Most probable tree

A most probable treemost probable tree for a collection of fully resolved phylogenetic trees is a tree that maximizes the weight:

Example

Solution

w = 0.0703125

Running time

The tree weight formula can be written as a product of the frequencies of the small subgroups

We designed an algorithm that finds all most probable trees for a given set of fully resolved phylogenetic trees

The complexity of the algorithm is O(l3t2log(lt)),where l is the number of leaves and t is the number of trees

Experiments

Data setsData sets used to test the new method:

Synthetic data: from Gascuel’s LIRMM site

K2P – Kimura 2 Parameter, no MC

K2Pm – Kimura 2 Parameter, with MC

COV – Covarion model, no MC

COVm – Covarion model, with MC

Real data: Ribosomal RNA

Experiments

ProgramsPrograms used to test the new method (19):Software Method Model

fastMe Minimum evolution JC, K2P

Mega Minimum evolution JC, K2P, TN

Mega Maximum parsimony

Mega Neighbor joining JC, K2P, TN

dnacomp DNA compatibility

dnaml Maximum likelihood

dnapars Maximum parsimony

neighbor Neighbor joining JC, K2P

neighbor UPGMA JC, K2P

weighbor Weighted neighbor joining JC, K2P

Most probable = Median

Reflects general tendency

Results: average split distance

Data set Minimum Distance

K2P 43.44

K2Pm 77.78

COV 52.67

COVm 69.11

Ribosomal 60.71

Consensus consistently yields minimum average split distance

May result in better tree

Results: distance to “real” tree

Data set Consensus Not Worse Than ...

K2P 72 %

K2Pm 39 %

COV 78 %

COVm 72 %

Ribosomal 100 %

Consensus consistently not worse off than majority of input trees

… of input trees

Theoretical foundations

All splits of a tree

H AA | BCDEFGH| BCDEFGHBB | ACDEFGH| ACDEFGH

ABAB | CDEFGH| CDEFGH

CC | ABDEFGH| ABDEFGHDD | ABCEFGH| ABCEFGH

HH | ABCDEFG| ABCDEFG

GG | ABCDEFH| ABCDEFH

FF | ABCDEGH| ABCDEGHEE | ABCDFGH| ABCDFGH

CDCD | ABEFGH| ABEFGH

EFEF | ABCDGH| ABCDGH

EFGEFG | ABCDH| ABCDH

ABCDABCD | EFGH| EFGH

Small subgroup of each split

H AA | BCDEFGH

BB | ACDEFGH

ABAB | CDEFGH

CC | ABDEFGH

DD | ABCEFGH

HH | ABCDEFG

GG | ABCDEFH

FF | ABCDEGH

EE | ABCDFGH

CDCD | ABEFGH

EFEF | ABCDGH

EFGEFG | ABCDH

ABCDABCD | EFGH

Small subgroups

EFGEFG

ABCDABCD

Maximal clusters (n-trees)

EFGEFG

ABCDABCD

Fundamental theoretical result

AA BBABAB

CC DDHH

GGFFEE

EFEFEFGEFG

ABCDABCD

● The small subgroup set of a phylogenetic tree is always a finite set of n-treesn-trees

● There are exactly three n-trees in this set, and all n-trees are maximal if and only if the phylogenetic tree is fully resolved

Implementation details

DD EE FF GG EFEF GHGH ABCABC

Dynamic programming

DD EE FF GG EFEF GHGH

FGHFGHDEFDEFABCABCDD EE DEDE

ABCABC

To Do List

Rooted trees

Polytomies

Non uniform weights for input trees

Acknowledgments

Scylla Bioinformatics and Institute of Computing, Unicamp, for machine time, infrastructure, and support

Brazilian Research Financing Agency CNPq, grant 470420/2004-9

A Fully Resolved Consensus Between Fully Resolved Phylogenetic Trees

Documents

Transcript of A Fully Resolved Consensus Between Fully Resolved Phylogenetic Trees

Comparison of the Actuator Line Model with Fully Resolved ...standupforwind.se/digitalAssets/588/c_588946-l_3-k... · Fully Resolved Simulations in Complex Environmental Conditions

Constructing Phylogenetic Trees - Shodorshodor.org › ... › calvin_cbbe11 › d2s1 › D2-2_Constructing_Phylogene… · Phylogenetic Tree Reconstruction PTR. Phylogenetic Tree

Phylogenetic studies

Misha Marie Gregersen - Technical University of Denmarkweb-files.ait.dtu.dk/bruus/TMF/publications/PhD/PhDthesisMIG.pdf · implemented and solved numerically for fully resolved electric

Phylogenetic Inference

GUEST LECTURE : Phylogenetic Analysis...Phylogenetic analysis: tree, timing, reconstruction of ancestors PHYLOGENETIC TREES 12 Character and Distance A phylogenetic tree can be based

A Fully Resolved Consensus Between Fully Resolved Phylogenetic Trees Jos é Augusto Amgarten Quitzau João Meidanis Scylla Bioinformatics, Brazil University.

Phylogenetic trees

Phylogenetic Inference

EN39 TUBE ASSESSMENT REPORT · 0 January, 1900 Have all outstanding quality issues been fully resolved, ... ·€€€€€€€€ Documented procedures to the requirements of

Phylogenetic community structure and phylogenetic turnover ...ib.berkeley.edu/labs/fine/Site/publications_files/fine_kembel2011.pdfPhylogenetic community structure and phylogenetic

Fully resolved viscoelastic particulate simulations using ...

Fully resolved quiet Sun magnetic fluxtubes

Phylogenetic methods

A fully resolved backbone phylogeny reveals numerous ... · A fully resolved backbone phylogeny reveals numerous dispersals and explosive diversifications throughout the history of

Photoabsorption spectra of the diamagneticharmonic inversion method we calculate fully resolved semiclassical photoabsorption spectra, i.e., individual eigenenergies and transition

IRB Credit Data - osfi-bsif.gc.ca · IRB Credit Data Defaulted and Fully Resolved ... 30 Facility detail record ... Secondary Industry Classification System

Phylogenetic Tree

Phylogenetic Tree

Phylogenetic reconstruction