SuperTriplets: a triplet-based supertree approach to phylogenomics

21
SuperTriplets: a triplet-based supertree approach to phylogenomics Vincent Ranwez, Alexis Criscuolo and Emmanuel J.P. Douzery

description

SuperTriplets: a triplet-based supertree approach to phylogenomics. Vincent Ranwez , Alexis Criscuolo and Emmanuel J.P. Douzery. Introduction: inferring phylogeny (1 gene). Introduction: inferring phylogeny (3 genes). Gene 1. Gene 2. Gene 3. ?????????????????? ??????????????????. - PowerPoint PPT Presentation

Transcript of SuperTriplets: a triplet-based supertree approach to phylogenomics

Page 1: SuperTriplets:  a triplet-based supertree approach to phylogenomics

SuperTriplets: a triplet-based supertree approach to phylogenomics

Vincent Ranwez, Alexis Criscuolo and Emmanuel J.P. Douzery

Page 2: SuperTriplets:  a triplet-based supertree approach to phylogenomics

SuperTriplets: ISBM 20102

Introduction: inferring phylogeny (1 gene)

Page 3: SuperTriplets:  a triplet-based supertree approach to phylogenomics

SuperTriplets: ISBM 20103

Introduction: inferring phylogeny (3 genes)

Gene 1 Gene 3Gene 2

??????????????????????????????????????????????????????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????

?????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????

????????????????????????????????????

SuperTree

SuperMatrix

Page 4: SuperTriplets:  a triplet-based supertree approach to phylogenomics

SuperTriplets: ISBM 20104

Introduction: inferring phylogeny (more data)

Gene 1000Gene 2

?????????????????????????????????????????????????????????????????????

??????????????????????????????????????????????????????

????????????????????????????????????????????????????????????????????????

????????????????????????????????????

SuperTree

SuperMatrix

………………………..………………………..………………………..………………………..………………………..………………………..………………………..………………………..………………………..………………………..………………………..………………………..………………………..………………………..……………………….………………………..

SNP / Morpho/ biblio

Page 5: SuperTriplets:  a triplet-based supertree approach to phylogenomics

SuperTriplets: ISBM 20105

Supertree overview: MRP

0100101001?11?0100

01??0?011?0???0010

??0011010??001????

0100010??00??001?0

111??0101000????01

MRP [Baum 1992, Ragan 1992] 1 binary sequence per taxon 1 site per clade (1=in the clade; 0 outside; ? missing)

MR P

ABCDEF

CDEABF

CDEFBA

MRP

[Goloboff and Pol, 2002] Relation contradicted by all source

trees

Page 6: SuperTriplets:  a triplet-based supertree approach to phylogenomics

SuperTriplets: ISBM 20106

Supertree overview: intuitive approach

The Supertree problem (intuitive formulation) Input: a collection of overlapping trees (a forest) Output: the tree that best represents this collection A major question is: how to define "best represents" ?

Vizualizing supertree candidates within the tree space

Median supertree Intuitive solution Generalization of the consensus tree Good theoretical properties [Steel and Rodriguo, 2008]

Page 7: SuperTriplets:  a triplet-based supertree approach to phylogenomics

SuperTriplets: ISBM 20107

Supertree oveview: median tree

d( , ) = + -

Tree decomposition as:• split set• quartet set• triplet set

Tree restrictionInitial trees

Page 8: SuperTriplets:  a triplet-based supertree approach to phylogenomics

SuperTriplets: ISBM 20108

Supertree overview: MRP and median tree

ED

CBA

T1

Triplet MRABCDEFGH

110?????0

11?0????0

AB|C AB|D … GH|F … FH|G …

………………………Rooting

FGH

BAC

T2

?????1010

………………………

?????0110

GFH

BAC

T3

………………………

0100101001?11?0100

01??0?011?0???0010

??0011010??001????

0100010??00??001?0

111??0101000????01

MR PInput forest

Page 9: SuperTriplets:  a triplet-based supertree approach to phylogenomics

SuperTriplets: ISBM 20109

Supertree overview: MRP and median tree

The parsimony value is related to the triplet distance: 1 parsimony step for triplets within the supertree 2 parsimony steps for others parsimony score = nbSites + (triplet distance)/2

The MRP approach is unadapted to triplet encoding for 100 taxa 97% of « ? » for 1000 taxa 99.7% of « ? » unnecessary huge matrices

Page 10: SuperTriplets:  a triplet-based supertree approach to phylogenomics

SuperTriplets: ISBM 201010

Supertriplets: few notations

Given a forest F of input trees N+(xy|z): number of occurrences of xy|z in F N-(xy|z) = N+(xz|y) + N+(yz|x) (alternive resolutions in F) Input trees are then useless (little impact of forest size)

Searching for the (asymmetric) triplet median tree T:

median :

d3(T,F) d3(T,Ti)Ti F

3| ( )

| | ( )

( , ) (2 ( | ) ( | | ) )

( ( | ) ( | ))

xy z triplets T

x y z triplets T

d T F N xy z N x y z

N xy z N xy z

asymmetric

Page 11: SuperTriplets:  a triplet-based supertree approach to phylogenomics

SuperTriplets: ISBM 201011

Supertriplets: general overview

N-(homo pan|mus)N+(homo pan|mus)

N-(pan bos|mus)N+(pan bos|mus)

N-(homo pan|bos)N+(homo pan|bos)

N-(mus pan| bos)N+(mus pan|bos)

……

triplet decompostion

first sketchNJ-like strategy

improvementNNI local search

branch supportand collapse

O(n3 |F| ) O(n3)+ consistency

O(n3) to test all branches once

O(n3)

Page 12: SuperTriplets:  a triplet-based supertree approach to phylogenomics

SuperTriplets: ISBM 201012

Supertriplets: agglomerative process

DE|ADE|BDE|C

AB|CAB|DAB|E

Triplets(T3 )

EDC

BA

T0

C1={A} C2={B}

EDC

BA

T1

C1={D} C2={E}

EDC

BA

T2

AC|D BC|DAC|E BC|E

C1={A,B} C2={C}

ED

CBA

T3

Page 13: SuperTriplets:  a triplet-based supertree approach to phylogenomics

SuperTriplets: ISBM 201013

Supertriplets: agglomerative process

Agglomeration of (CA,CB ) Transform T into T’ Resolve some new triplets (AB|X) with ACA, BCB, X{CACB}

d3( T’,F ) = d3( T,F ) - ( ∑ N+(AB|X) - ∑ N-(AB|X) )

We select the pair maximizing Score (CA, CB) = (∑ N+(AB|X) - ∑ N- (AB|X) ) / (∑ N+(AB|X) + ∑ N-(AB|X) )

The whole process is O(n3) : when CA and CB are agglomerated score(CD , CE ) is unchanged

score(C{AB} ,CD ) is easily derived from Score (CA, CD ) and Score (CB, CD )

Page 14: SuperTriplets:  a triplet-based supertree approach to phylogenomics

SuperTriplets: ISBM 201014

Supertriplets: NNI optimisation

The variation d3(T’,F) - d3(T,F) depends on few triplets (here ) All these variations are initially evaluated in O(n3)

Once a NNI is done few NNI have to be re-evaluated (4 adjacent edges) NNI optimisation is therefore very fast

2 possible NNI per edge

T T’

Page 15: SuperTriplets:  a triplet-based supertree approach to phylogenomics

SuperTriplets: ISBM 201015

Supertriplets: edge supports

Local support ∑ N+( ) / [ ∑ N+( ) + ∑ N-( ) ] If <0.5 collapsing the edge improve d3(T,F)

Global support Also take into account N+( ) and N- ( ) impact two edges

Final edge support: min (local, global)

T

Page 16: SuperTriplets:  a triplet-based supertree approach to phylogenomics

SuperTriplets: ISBM 201016

Supertriplets: simulation protocol

Are they similar?Triplet/split measure

[Eulenstein et al. 2004] [Criscuolo et al. 2006]

Page 17: SuperTriplets:  a triplet-based supertree approach to phylogenomics

SuperTriplets: ISBM 201017

Supertriplets: simulation results

Less resolvedVery few errors

Contain errors

lack of resolutionperfect

Splits

triplets

Page 18: SuperTriplets:  a triplet-based supertree approach to phylogenomics

SuperTriplets: ISBM 201018

Supertriplets: phylogenomic case study

Supertree of 33 mammals Species: complete genomes

( EnsEMBL v54)

Sequences: orthologous CDS (orthoMaM v5)

Gene trees: 13 000 ML trees (inferred using PAUP)

Output supertree Computed in 30s Congruent with [Prasad et al. 2008]

Page 19: SuperTriplets:  a triplet-based supertree approach to phylogenomics

SuperTriplets: ISBM 201019

Conclusion & prospects

(Asymmetric) median supertree Easy to understand Makes tree weighting natural

MRP, triplets and median supertree Understanding the criteria optimized by MRP Design a dedicated algorithm to optimize it http://www.supertriplets.univ-montp2.fr/

Supertrees & supermatrix are complementary 1 000 vertebrate genome project Divide and conquer approach

i) trees based on multiple CDSs (supermatrix)ii) assembling those trees (supertree)

Page 20: SuperTriplets:  a triplet-based supertree approach to phylogenomics

SuperTriplets: ISBM 201020

Supertriplets: http://www.supertriplets.univ-montp2.fr/

N-(homo pan|mus)N+(homo pan|mus)

N-(pan bos|mus)N+(pan bos|mus)

N-(homo pan|bos)N+(homo pan|bos)

N-(mus pan| bos)N+(mus pan|bos)

……

triplet decompostion

first sketchNJ-like strategy

improvementNNI local search

branch supportand collapse

O(n3 |F| ) O(n3)+ consistency

O(n3) to test all branches once

O(n3)

Less resolvedVery few errors

Page 21: SuperTriplets:  a triplet-based supertree approach to phylogenomics

SuperTriplets: ISBM 201021

Supertree overview: asymmetric median tree

EDCBA

EDCBA

EDCBA

EDCBA

d(F1, ) = d( + )

EDCBA

EDCBA

EDCBA

EDCBA

F1

d(F1, ) = 3 * d( + )

d(F2, ) = 3*d( + ) d(F2, ) = d( + )

F2

REF