Post on 29-Dec-2015
Coalescence and the Cenancestor
J. Peter GogartenUniversity of Connecticut
Department of Molecular and Cell Biology
Acknowledgements
NSF Microbial GeneticsNASA Exobiology Program
Re. Coalescence:Andrew Martin (U of C Boulder)
Joe Felsenstein (U of Wash)Hyman Hartman (MIT)
Yuri Wolf (NCBI)
Gogarten Lab:Olga Zhaxybayeva
Cenancestor – (from the Greek “kainos” meaning recent and
“koinos” meaning common) – the most recent common
ancestor of all the organisms that are alive today. The term
was proposed by Fitch in 1987.
All life forms on Earth share common ancestry
Where is the root of the tree of life?
• On the branch leading to Bacteria (Gogarten, J.P. et al.,1989; Iwabe, N. et al., 1989)
• On the branch leading to Archaea (Woese, C.R.,1987)
• On the branch leading to Eukaryotes (Forterre, P. and Philippe, H., 1999; Lopez, P. et al., 1999)
• Under aboriginal trifurcation (Woese, C.R., 1978)
• Inconclusive results (Caetano-Anolles, G., 2002)
Detection of Ancient Gene Duplications using Whole Genome Data
SELECTION OF GENOMES12 representative completed genomes
SELECTION OF GENOMES12 representative completed genomes
RANKING OF PUTATIVE ANCIENT PARALOGOUS PAIRSElimination of duplicates; ranking by number of occurrences
RANKING OF PUTATIVE ANCIENT PARALOGOUS PAIRSElimination of duplicates; ranking by number of occurrences
ANALYSIS OF PUTATIVE ANCIENT PARALOGOUS PAIRSAdd homologous sequences, perform alignment and reconstruct phylogenetic trees
ANALYSIS OF PUTATIVE ANCIENT PARALOGOUS PAIRSAdd homologous sequences, perform alignment and reconstruct phylogenetic trees
DETECTION OF PUTATIVE ANCIENT PARALOGOUS PAIRSDETECTION OF PUTATIVE ANCIENT PARALOGOUS PAIRS
Encoded proteins,
Genome A
Encoded proteins,
Genome A
Encoded proteins,
Genome B
Encoded proteins,
Genome B
BLAST,E-value cutoff 10-8
(A1,B1) and (A2,B2) are putativeanciently
duplicated genes
(A1,B1) and (A2,B2) are putativeanciently
duplicated genes
Gene A1Gene A1 Gene B1Gene B1
Gene A2Gene A2 Gene B2Gene B2
Reciprocal top scoring BLAST hit
Reciprocal top scoring BLAST hit
Ancient Gene Duplications and Root of Tree of Life
Root LocationTotal
number of datasets
Reliable Ambiguous or
Unresolved
Between Domains 32 7 25
Within Archaea 3 2 1
Within Bacteria 9 0 9
Polyphyletic, only Archaeal sequences group with Archaeal ortholog of the A-B pair
17 1 16
Polyphyletic, only Bacterial sequences group with Bacterial ortholog of the A-B pair
14 5 9
Polyphyletic on both sides of root
11 0 11
SS
U-r
RN
A T
ree
of L
i fe
EuglenaTrypanosoma
Zea
Paramecium
Dictyostelium
EntamoebaNaegleria
Coprinus
Porphyra
Physarum
HomoTritrichomonas
Sulfolobus
ThermofilumThermoproteus
pJP 27pJP 78
pSL 22pSL 4
pSL 50
pSL 12
E.coli
Agrobacterium
Epulopiscium
AquifexThermotoga
Deinococcus
Synechococcus
Bacillus
Chlorobium
Vairimorpha
Cytophaga
HexamitaGiardia
mitochondria
chloroplast
Haloferax
Methanospirillum
Methanosarcina
Methanobacterium
ThermococcusMethanopyrus
Methanococcus
ARCHAEA BACTERIA
EUCARYA
Encephalitozoon
Thermus
EM 17
0.1 changes per nt
Marine group 1
RiftiaChromatium
ORIGIN
Treponema
CPSV/A-ATPaseProlyl RSLysyl RSMitochondriaPlastids
Fig. modified from Norman Pace
Another example of topology with long empty branches connecting three domains of life –
proton pumping ATPases
Olendzenski et al, 2000
0.1
BfTreponemaBArabidops
BHomoBSaccharomyces
BCandidaBNeurospora373
744812
1000
BEnterococcosBBMethanosarcina
BSulfolobuBDesulfurococcus
206
BDeinococcusBThermus957
1000
alpE.colialpVibrio1000
alpPropionigeniumalpSynechococcus
alpNicotianaalpBos
alpSchizosalpSaccharom.
alpNeurospora830927610
949
alpMycobacteriumalpThermotoga
alpPS3441593
388
541
1000
838
FfBorreliaFlSalmonellaFTreponema
FlBacillus901951
1000
betAquifexbetMycoplasma
betThermotogabetE.coli
betVibrio997
betEnterococcusbetPS3971
betWolinellabetChlorobium
betPectinatusbetPropionigenium
betBosbetNicotiana616
betSaccharomycesbetNeurospora
BetSchizossacch.388315
962
88
174
442564
1000
ASulfolobusAfTreponema378
AEnterococcusABMethanosarcinaAMMethanosarcina1000
AHaloferaxAHalobacterium1000
984
AjMethanoccusADesulfurococcus989
370268
417
AThermusADeinococcus1000
448
AGiardiaAPlasmodium
ATrypanosoACyanidium
AArabidopsAVigna
ADaucusAGossypium
8271000
A1AcetabulariaA2Acetabularia
1000999
914
AGallus A1AMusA1Homo1
ABos1
1000
ADrosophilaAAedes
AManduca7271000
1000
ANeurosporaASaccharomycesACandida1000
1000
262
ADictyosteliumAEntamoeba
282
299
513
954
973
1000
901
901
Flagellar Assembly ATPases
Non C
atalytic Subunits
Catalytic S
ubunits
Bacterial (F
-)A
rchaeal (A-) &
Vacuolar (v-)
Bacterial (F
-)A
rchaeal (A-)
Vacuolar (V
-) AT
Pase S
U
Phylogenetic tree of phenylalanyl-tRNA synthetase beta-subunits.
J.R. Brown, Syst. Biol. 50(4):497–512 , 2001
H. Philippe, P. Forterre, J Mol Evol (1999) 49:509–523
Phylogenetic tree of Ile- and Val-tRNA synthetases
Hypotheses to explain the long empty branches connecting the three domains of life
Early evolution prior to the split of the three domains was following different mechanisms.
• Wolfram Zillig, 1992
• Otto Kandler, 1994
• Carl Woese, 1998
• Arthur Koch, 1994
Hypotheses to explain the long empty branches connecting the three domains of life (cont.)
There were major catastrophic events in the past that led to a bottleneck with only few survivors
For example:
•Tail of early heavy bombardment
•Rise of oxygen
Hypotheses to explain the long empty branches connecting the three domains of life (cont.)
There were major catastrophic events in the past that led to a bottleneck with only few survivors
Hypotheses to explain the long empty branches connecting the three domains of life (cont.)
Coalescence – the
process of tracing
lineages backwards
in time to their
common ancestors.
Every two extant
lineages coalesce
to their most recent
common ancestor.
Eventually, all
lineages coalesce
to the cenancestor.
t/2(Kingman, 1982)
Illustration is from J. Felsenstein, “Inferring Phylogenies”, Sinauer, 2003
Clade – (from the Greek “klados” meaning branch or twig) –
a group of organisms that includes all of the descendants of
an ancestral taxon. In a rooted phylogeny every node
defines a clade as the lineages originating from this node,
including those that arise in successive furcations.
Coalescence as an Approach to Study Cladogenesis
Cladogenesis – the process of clade formation
clade
Simulations of cladogenesis by coalescence
•One extinction and one speciation event per generation
•Horizontal transfer event once in 10 generations
RED: organismal lineages (no HGT)BLUE: molecular lineages (with HGT)GRAY: extinct lineages
RESULTS:•Most recent common ancestors are different for organismal and molecular phylogenies
•Different coalescence times
•Long coalescence of the last two lineages
EX
TA
NT
LIN
EA
GE
S F
OR
TH
E S
IMU
LAT
ION
S O
F 5
0 LI
NE
AG
ES
green: organismal lineages ; red: molecular lineages (with gene transfer)
Number of extant lineages over time
Adam and Eve never met
Albrecht Durer, The Fall of Man, 1504
MitochondrialEve
Y chromosomeAdam
Lived approximately
50,000 years ago
Lived 166,000-249,000
years ago
Thomson, R. et al. (2000) Proc Natl Acad Sci U S A 97, 7360-5
Underhill, P.A. et al. (2000) Nat Genet 26, 358-61
Cann, R.L. et al. (1987) Nature 325, 31-6
Vigilant, L. et al. (1991) Science 253, 1503-7
CONCLUSIONS•Coalescence leads to long branches at the root.
•Using single genes as phylogenetic markers makes it difficult to trace organismal phylogeny in the presence of horizontal gene transfer.
•Each contemporary molecule has its own history and traces back to an individual molecular cenancestor, but these molecular ancestors were likely to be present in different organisms at different times.
•The model alone already explains some features of the observed topology of the tree of life. Therefore, it does not appear warranted to invoke more complex hypotheses involving bottlenecks and extinction events to explain these features of the tree of life.
OUTLOOK
• Incorporate non-random Horizontal Gene Transfer into the
model.
• Consider the model as a null hypothesis and look for
deviations from it. According to the model the bottom
branches are expected to be long for every individual clade.
This does not seem to be the case for bacterial domain. This
deviation could be due to sudden radiation.
• Explore if the model can be used to infer parameters that
describe the early evolution of life. E.g., number of species.