Multiple independent horizontal transfers of informational genes from bacteria to plasmids and...
-
Upload
david-moreira -
Category
Documents
-
view
213 -
download
0
Transcript of Multiple independent horizontal transfers of informational genes from bacteria to plasmids and...
![Page 1: Multiple independent horizontal transfers of informational genes from bacteria to plasmids and phages: implications for the origin of bacterial replication machinery](https://reader035.fdocuments.net/reader035/viewer/2022081822/575024021a28ab877eacc428/html5/thumbnails/1.jpg)
MicroOpinion
Multiple independent horizontal transfers ofinformational genes from bacteria to plasmids andphages: implications for the origin of bacterialreplication machinery
David Moreira
UPRES-A 8080, Equipe PhylogeÂnie et Evolution
MoleÂculaires, BaÃtiment 444, Universite Paris-Sud,
91405 Orsay Cedex, France.
Summary
In contrast to the universality of other central genetic
mechanisms, the replication machinery of Bacteria is
clearly different from those of Archaea and Eukary-
otes. A large number of bacterial genes involved in
DNA replication can also be found in plasmids and
phages. Based on this, it has been recently proposed
that the ancestral bacterial genes were displaced by
non-orthologous replication genes from plasmids
and phages, which would explain the profound differ-
ence between Bacteria and the other domains of life.
The alternative hypothesis is that these DNA replica-
tion genes have been frequently transferred from bac-
terial hosts to the genomes of their plasmids and
phages. The phylogenetic analysis of the bacterial
DNA replication proteins most abundant in databases
(replicative helicase DnaB, single-strand binding pro-
tein Ssb and topoisomerase TopB) presented here sup-
ports the latter hypothesis. Each protein tree shows
that sequences from plasmids and phages branch
close to their bacterial-speci®c hosts, suggesting
multiple independent horizontal transfers. Therefore,
there is no evidence so far for non-orthologous gene
displacement of these genes.
Introduction
An intriguing observation puzzles evolutionary biologists,
in particular after the completion of the genome sequences
from a variety of organisms. Bacteria are endowed with a
DNA replication machinery clearly different from Archaea
and Eukaryotes. Indeed, whereas the translation apparatus
and a few central components of the transcription system
are basically universal, most proteins involved in bacterial
DNA replication seem to lack archaeal and eukaryotic
homologues (Olsen and Woese, 1996). This is also the
case for several bacterial proteins related to recombina-
tion and repair (Olsen and Woese, 1997; Aravind et al.,
1999). To date, two alternative hypotheses had been pro-
posed to explain the striking disparity. One was that the
last common ancestor of all extant organisms (or cenances-
tor) possessed an RNA genome, and that DNA appeared
independently in the two branches stemming from it (Bac-
teria and Archaea/Eukaryotes). Accordingly, replication
and related machineries would also have independent
and non-homologous origins (Mushegian and Koonin,
1996). However, the similarities between domains concern-
ing many other DNA-related proteins, although limited,
are probably suf®cient to support the hypothesis of a
DNA-based cenancestor (Olsen and Woese, 1996). The
alternative hypothesis envisaged that the DNA replication
machinery was poorly developed at the cenancestral stage.
Therefore, no central components would have been univer-
sally conserved, or their sequences would have diverged
to such an extent that their homology is at present imper-
ceptible (Olsen and Woese, 1996). Forterre has recently
criticized this view, arguing that there is no evident reason
to explain why, despite their involvement in common mech-
anisms, some proteins have evolved very rapidly (becom-
ing apparently unrelated) whereas others have evolved
slowly, producing still recognisable orthologues (Forterre,
1999). He proposes instead a new hypothesis based on
non-orthologous gene displacement of the bacterial repli-
cative proteins by foreign analogues. Non-orthologous
gene displacement refers to the cases in which non-ortho-
logous (unrelated or distantly related) proteins are fully
responsible for an essential function in different species
(Koonin et al., 1996). Forterre further suggests that plas-
mids and viruses were the donors of the non-orthologous
genes that displaced the bacterial ones, as many of them
Molecular Microbiology (2000) 35(1), 1±5
Q 2000 Blackwell Science Ltd
Received 12 August, 1999; revised 23 September, 1999; accepted 29September, 1999. E-mail [email protected]; Tel. (�33)1691 56805; Fax (�33) 1691 56803.
![Page 2: Multiple independent horizontal transfers of informational genes from bacteria to plasmids and phages: implications for the origin of bacterial replication machinery](https://reader035.fdocuments.net/reader035/viewer/2022081822/575024021a28ab877eacc428/html5/thumbnails/2.jpg)
carry a plentiful diversity of genes involved in replication,
transcription, recombination or repair (for examples, see
Forterre, 1999). In some cases, these genes are speci®-
cally related to the homologues from the particular host
of the plasmid or virus but, in others, this speci®c relation-
ship seems not to exist. In such cases, Forterre proposes
that those plasmid/virus genes could have originated
independently or diverged very early in evolution or very
rapidly. As the diversity of these genes is larger than
those of bacteria, he concludes that the preferred direction
of displacement should have been plasmid/virus to bac-
teria. He provides some examples of bacterial replication
proteins with possible plasmidic or viral origin, such as
DnaB, the family DnaC/DnaI/DnaA, DnaG, DnaE, DnaD,
Ssb and TopB (see Table 1 in Forterre, 1999).
Forterre's hypothesis has an advantage over the two
others cited above. It is amenable to testing by phylogenetic
analysis. If the hypothesis is correct, the phylogenetic ana-
lysis of each protein of putative plasmidic or viral origin
should yield a tree in which the bacterial sequences form
a monophyletic group branching off from the plasmidic or
viral donor sequence. Interestingly, there is a bona ®de
case of non-orthologous gene displacement affecting a
large group of organisms, which may be used as a model
for this testing: mitochondrial and chloroplastic RNA poly-
merases. All but one known mitochondria possess a bac-
teriophage T3/T7-type single-subunit RNA polymerase
(encoded also by several mitochondrial linear plasmids),
instead of a classical bacterial-type multiple-subunit RNA
polymerase. Chloroplasts possess both T3/T7- and bacter-
ial-type polymerases (Gray and Lang, 1998). The bacterial
ancestor of mitochondria acquired the RNA polymerase
from a bacteriophage or a linear plasmid that displaced
the host's enzyme. A phylogenetic tree of the chloroplast,
mitochondrial, bacteriophage and plasmidic sequences
displays the expected situation (Fig. 1). Mitochondrial sequ-
ences form a clear monophyletic group, with chloroplast
sequences branching off as a sister group of plant mito-
chondria (indicating that chloroplasts acquired the bac-
teriophage-type polymerase by a secondary transfer from
mitochondria). Bacteriophage and plasmid sequences
form two independent groups and therefore it is not poss-
ible to decide whether mitochondrial RNA polymerases
have a bacteriophage or plasmidic origin based only on
the present available sequences. If bacteriophage or
plasmid sequences were intermixed with those of orga-
nelles, multiple horizontal transfers could be inferred.
Phylogenetic analysis of bacterial replication
proteins
I have analysed in a similar way all the bacterial replication
proteins for which Forterre proposes a possible plasmidic
or viral origin (see above). I present here those produc-
ing the most signi®cant results, namely, those having the
largest number of available sequences, including a variety
of bacteria, plasmids and phages, although the other genes
yielded similar results (not shown). These are the replicative
helicase (DnaB), the single-strand binding protein (Ssb)
and the topoisomerase III (TopB) (Fig. 1). In all cases, a
preliminary phylogenetic analysis, carried out using conven-
tional methods of phylogenetic reconstruction (neighbour
joining and maximum parsimony), revealed the existence
of strong differences of evolutionary rate among sequences.
Phage and plasmid sequences were especially affected,
showing in most cases very long branches (data not
shown). This represents an important problem for phylo-
genetic reconstruction, because long branches tend to be
misplaced and to group together in trees (the long branch
attraction artefact, described by Felsenstein, 1978). In addi-
tion, sequences also exhibited levels of among-site rate var-
iation (i.e. differences of variation rate for the different
sequence positions) that could produce incorrect phylo-
genetic trees (Yang, 1996). In fact, the a parameter values
of the gamma law, estimated using the program PUZZLE,
version 4.0.2 (Strimmer and von Haeseler, 1996), were
1.13, 3.86 and 1.14 for the DnaB, Ssb and TopB data
sets respectively. (Note that a may vary between 0, for
data sets with intense among-site rate variation, and in®n-
ity, for data sets in which all sites evolve at the same rate).
Again, phage and plasmid sequences contributed espe-
cially to this problem, as data sets excluding them showed
higher a values (1.83, 9.43 and 1.92 for bacterial DnaB,
Ssb and TopB respectively). Consequently, the plasmid
and phage sequences were clearly a potential cause of
artefacts for classical phylogenetic reconstruction methods.
I opted, therefore, for a more reliable approach; maximum
likelihood (the less sensitive to long branch attraction) with
a gamma law (to correct for among-site rate variation),
using the program PUZZLE 4.0.2. When very close species
Q 2000 Blackwell Science Ltd, Molecular Microbiology, 35, 1±5
Fig. 1. Maximum likelihood phylogenetic trees for T3/T7-type RNA polymerases, DnaB, Ssb and TopB sequences. Bold branchescorrespond to phage (,) and plasmid (X) sequences. Names in brackets indicate the speci®c hosts of the respective phages, viruses orplasmids. Species names marked with `cp' and `mt' correspond to chloroplast and mitochondrial sequences respectively. Trees wereconstructed using the program PUZZLE version 4.0.2, with the JTT model of substitution and 16 gamma rate categories (the programdescription is available at http:/ /members.tripod.de/korbi/puzzle/manual.html). Only unambiguous aligned regions of the proteins were used,and constant positions were removed (alignments are available on request). All trees are unrooted, except the TopB tree, rooted usingeukaryotic and archaeal homologous sequences. Bars correspond to 10 amino acid substitutions per site.
2 D. Moreira
![Page 3: Multiple independent horizontal transfers of informational genes from bacteria to plasmids and phages: implications for the origin of bacterial replication machinery](https://reader035.fdocuments.net/reader035/viewer/2022081822/575024021a28ab877eacc428/html5/thumbnails/3.jpg)
Q 2000 Blackwell Science Ltd, Molecular Microbiology, 35, 1±5
Horizontal transfer of informational genes 3
![Page 4: Multiple independent horizontal transfers of informational genes from bacteria to plasmids and phages: implications for the origin of bacterial replication machinery](https://reader035.fdocuments.net/reader035/viewer/2022081822/575024021a28ab877eacc428/html5/thumbnails/4.jpg)
were present (e.g. Escherichia coli and Salmonella typhi-
murium ), only one representative was kept to accelerate
the time-consuming maximum likelihood calculations.
For DnaB and Ssb, databases already contained an
abundant number of sequences, including, together with
those from bacteria, several from phages, plasmids and
organelles (chloroplasts or mitochondria). The phylogenetic
trees obtained using these sequences show a picture com-
pletely different from that derived from the RNA polymerase
data set (Fig. 1). Bacterial sequences do not form a mono-
phyletic group, but are intermixed with plasmid and phage
sequences. Furthermore, plasmids and phages branch
close to their speci®c hosts. This is so not only in cases
for which there is an evident sequence similarity (e.g. the
DnaB of E. coli and its phage P1), but even for very rapidly
evolving plasmid and phage sequences. For instance, the
rapidly evolving DnaB sequences of the plasmid lp28-2
of Borrelia burgdorferi, of diverse Chlamydia sp. plasmids
or those of the bacteriophages SPP1 of Bacillus subtilis
and MAV1 of Mycoplasma arthritidis cluster with their
respective hosts. Notably, even the extremely rapidly evol-
ving sequences of the coliphages HK022, T3 and T7
branch close to the g-Proteobacteria (E. coli and Haemo-
philus in¯uenzae ; Fig. 1). The reliability of the results is
supported by the position of the organelle sequences.
As expected, the chloroplast DnaB sequences group
speci®cally with the cyanobacterium Synechocystis sp.,
whereas the mitochondrial Ssb sequences branch close
to the a-Proteobacteria (Brucella abortus, Rhodobacter
sphaeroides, Rickettsia prowazekii, and Sphingomonas
aromaticivorans ).
The situation was, a priori, different for the bacterial
topoisomerase TopB, given that it was only known in an
apparently very restricted number of bacteria (four species:
the g-proteobacteria E. coli and H. in¯uenzae and the
Gram-positive Bacillus ®rmus and B. subtilis ), but had
many homologues encoded by plasmids (interestingly,
some of them hosted by g-proteobacteria or by Bacillus
species). Forterre considered this disproportion as addi-
tional evidence for a non-orthologous gene displacement
case. However, a thorough BLAST search of all available
databases (including the NCBI at http:/ /www.ncbi.nlm.
nih.gov/ and those containing un®nished genome sequ-
ence data: TIGR at http:/ /www.tigr.org/; Sanger Center
at http:/ /www.sanger.ac.uk/; Genome Therapeutics at
http:/ /www.cric.com/; Washington University at http:/ /
genome.wustl.edu/; and University of Minnesota at http:/ /
www.cbc.umn.edu/) allowed the identi®cation of 20 new
complete bacterial TopB homologues. They were present
in the genomes of b-Proteobacteria (Bordetella bronchi-
septica and B. pertussis ), g-Proteobacteria (Klebsiella
pneumoniae, Pasteurella multicocida, Pseudomonas
putida, Salmonella typhimurium, Shewanella putrefa-
ciens, Vibrio cholerae, Yersinia pestis, and two different
copies in Salmonella typhi ), Cytophagales (Porphyro-
monas gingivalis ) and Gram-positives (Enterococcus fae-
calis, Staphylococcus aureus, two copies in Clostridium
acetobutilicum and three copies in Clostridium dif®cile ).
In addition, partial sequences can be identi®ed in the
a-proteobacterium Caulobacter crescentus and in the
g-proteobacterium Thiobacillus ferrooxidans. Therefore,
TopB seems to be much more widespread than previously
thought, and its apparently restricted distribution is no
longer a sound argument for its origin through non-ortho-
logous gene displacement. The phylogenetic analysis of
the previously available sequences, together with a large
sample of the new sequences, yielded a tree topology
similar to those of the DnaB and Ssb sequences (Fig. 1).
Once again, plasmid sequences appeared intermixed
with bacterial ones. In some cases, there was a clear group-
ing of the plasmid sequence with its speci®c host or a close
relative. For instance, the plasmids rp4 and R751 branched
within the g-Proteobacteria, whereas the plasmid pX01 was
the closest relative of B. subtilis. Even the very rapidly
evolving plasmids pMRC01, pDB101, pAMb1 and pSK41
branched within the Gram-positives. However, in this case
there was not a perfect correlation with their hosts, although
evidence suggests that this is a result of very recent hori-
zontal transfers of these plasmids among different Gram-
positive species. For instance, the TopB sequences of
the plasmids pDB101 and pAMb1 are almost identical
(94% identity), although their hosts (Streptococcus pyo-
genes and E. faecalis respectively) are not so closely
related. Clearly, one of these plasmids has very recently
acquired this gene.
Conclusions
In the three cases described above, the phylogenetic
analyses do not lend any evidence to the putative non-
orthologous displacement of bacterial replicative genes
by plasmid or phage genes. On the contrary, the trees
obtained support the current idea that multiple horizontal
transfers of genes from bacteria to their speci®c plasmids
or phages have occurred. Thus, bacteria to plasmid/
phage transfer of a variety of informational genes seems
to be a very common phenomenon. Subsequently, a sig-
ni®cant evolutionary rate acceleration of the transferred
genes generally occurs, as attested by the long branches
they usually exhibit (Fig. 1). This is in agreement with the
observation that phages normally have faster evolutionary
rates than their cellular hosts (Drake et al., 1998).
Also, viruses that infect eukaryotic cells carry a diversity
of homologues of eukaryotic-type informational proteins.
These include type IB topoisomerase, primase, DNA ligase,
subunits of RNA polymerase II, transcription factor SII,
translation initiation factor 2a, RNase H1, mRNA capping
enzymes, 26S protease, HMG non-histonal chromosomal
Q 2000 Blackwell Science Ltd, Molecular Microbiology, 35, 1±5
4 D. Moreira
![Page 5: Multiple independent horizontal transfers of informational genes from bacteria to plasmids and phages: implications for the origin of bacterial replication machinery](https://reader035.fdocuments.net/reader035/viewer/2022081822/575024021a28ab877eacc428/html5/thumbnails/5.jpg)
proteins, uracil-DNA glycosylase, dUTPase and ribonucleo-
tide reductase (data not shown). A non-orthologous eukary-
otic gene displacement by genes of viral origin could also
be advanced. However, as in the cases studied here for
bacteria, the eukaryotic-type genes present in viruses
appear to be the result of horizontal transfers from their
particular hosts (data not shown).
In conclusion, the evolutionary enigma concerning the
deep difference between the bacterial and the archaeal/
eukaryotic replication machineries remains open. Although
non-orthologous gene displacement may explain some spe-
ci®c incongruities at a smaller evolutionary scale, there is
no current evidence for its connection with this macroevo-
lutionary problem. Nevertheless, this could change with
the discovery of putative phages or plasmids carrying true
ancestral sequences, and therefore the research on this
subject should continue. On the other hand, the hypothesis
of an early acceleration of evolutionary rate leading to the
impossibility of recognizing the homology between replica-
tive proteins of Bacteria and Archaea/Eukaryotes should
not be discarded. For instance, the eukaryotic processivity
factor, PCNA, and the bacterial Pol b have very dissimilar
amino acid sequences, but the use of crystallographic data
has allowed the recognition of their homology (Kelman and
O'Donnell, 1995). In addition, the use of new sensitive
search methods is bringing to light superfamilies of homo-
logous proteins that only a few years ago remained com-
pletely unseen (Altschul et al., 1997). This could be the
case of other replication proteins.
Acknowledgements
I thank Puri®cacioÂn LoÂpez for critical reading of the manuscriptand the Fondation des Treilles for support. Preliminary sequ-ence data obtained from The Institute for Genomic Research,Genome Therapeutics, Sanger Centre, University of Minne-sota, and University of Washington are acknowledged.
References
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang,Z., Miller, W., and Lipman, D.J. (1997) Gapped BLAST andPSI-BLAST: a new generation of protein database search pro-grams. Nucleic Acids Res 25: 3389±3402.
Aravind, L., Walker, D.R., and Koonin, E.V. (1999) Conserveddomains in DNA repair proteins and evolution of repair sys-tems. Nucleic Acids Res 27: 1223±1242.
Drake, J.W., Charlesworth, B., Charlesworth, D., and Crow,J.F. (1998) Rates of spontaneous mutation. Genetics 148:1667±1686.
Felsenstein, J. (1978) Cases in which parsimony or compati-bility methods will be positively misleading. Syst Zool 27:401±410.
Forterre, P. (1999) Displacement of cellular proteins by func-tional analogues from plasmids or viruses could explainpuzzling phylogenies of many DNA informational proteins.Mol Microbiol 33: 457±465.
Gray, M.W., and Lang, B.F. (1998) Transcription in chloro-plasts and mitochondria: a tale of two polymerases. TrendsMicrobiol 6: 1±3.
Kelman, Z., and O'Donnell, M. (1995) Structural and func-tional similarities of prokaryotic and eukaryotic DNA poly-merase sliding clamps. Nucleic Acids Res 23: 3613±3620.
Koonin, E.V., Mushegian, A.R., and Bork, P. (1996) Non-ortho-logous gene displacement. Trends Genet 12: 334±336.
Mushegian, A.R., and Koonin, E.V. (1996) A minimal geneset for cellular life derived by comparison of completebacterial genomes. Proc Natl Acad Sci USA 93: 10268±10273.
Olsen, G.J., and Woese, C.R. (1996) Lessons from anArchaeal genome: what are we learning from Methano-coccus jannaschii? Trends Genet 12: 377±379.
Olsen, G.J., and Woese, C.R. (1997) Archaeal genomics: anoverview. Cell 89: 991±994.
Strimmer, K., and von Haeseler, A. (1996) Quartet puzzling:a quartet maximum likelihood method for reconstructingtree topologies. Mol Biol Evol 13: 964±969.
Yang, Z. (1996) Among-site rate variation and its impact onphylogenetic analyses. Trends Ecol Evol 11: 367±370.
Q 2000 Blackwell Science Ltd, Molecular Microbiology, 35, 1±5
Horizontal transfer of informational genes 5