Multiple independent horizontal transfers of informational genes from bacteria to plasmids and...

5
MicroOpinion Multiple independent horizontal transfers of informational genes from bacteria to plasmids and phages: implications for the origin of bacterial replication machinery David Moreira UPRES-A 8080, Equipe Phyloge ´ nie et Evolution Mole ´ culaires, Ba ˆ timent 444, Universite ´ Paris-Sud, 91405 Orsay Cedex, France. Summary In contrast to the universality of other central genetic mechanisms, the replication machinery of Bacteria is clearly different from those of Archaea and Eukary- otes. A large number of bacterial genes involved in DNA replication can also be found in plasmids and phages. Based on this, it has been recently proposed that the ancestral bacterial genes were displaced by non-orthologous replication genes from plasmids and phages, which would explain the profound differ- ence between Bacteria and the other domains of life. The alternative hypothesis is that these DNA replica- tion genes have been frequently transferred from bac- terial hosts to the genomes of their plasmids and phages. The phylogenetic analysis of the bacterial DNA replication proteins most abundant in databases (replicative helicase DnaB, single-strand binding pro- tein Ssb and topoisomerase TopB) presented here sup- ports the latter hypothesis. Each protein tree shows that sequences from plasmids and phages branch close to their bacterial-specific hosts, suggesting multiple independent horizontal transfers. Therefore, there is no evidence so far for non-orthologous gene displacement of these genes. Introduction An intriguing observation puzzles evolutionary biologists, in particular after the completion of the genome sequences from a variety of organisms. Bacteria are endowed with a DNA replication machinery clearly different from Archaea and Eukaryotes. Indeed, whereas the translation apparatus and a few central components of the transcription system are basically universal, most proteins involved in bacterial DNA replication seem to lack archaeal and eukaryotic homologues (Olsen and Woese, 1996). This is also the case for several bacterial proteins related to recombina- tion and repair (Olsen and Woese, 1997; Aravind et al ., 1999). To date, two alternative hypotheses had been pro- posed to explain the striking disparity. One was that the last common ancestor of all extant organisms (or cenances- tor) possessed an RNA genome, and that DNA appeared independently in the two branches stemming from it (Bac- teria and Archaea/Eukaryotes). Accordingly, replication and related machineries would also have independent and non-homologous origins (Mushegian and Koonin, 1996). However, the similarities between domains concern- ing many other DNA-related proteins, although limited, are probably sufficient to support the hypothesis of a DNA-based cenancestor (Olsen and Woese, 1996). The alternative hypothesis envisaged that the DNA replication machinery was poorly developed at the cenancestral stage. Therefore, no central components would have been univer- sally conserved, or their sequences would have diverged to such an extent that their homology is at present imper- ceptible (Olsen and Woese, 1996). Forterre has recently criticized this view, arguing that there is no evident reason to explain why, despite their involvement in common mech- anisms, some proteins have evolved very rapidly (becom- ing apparently unrelated) whereas others have evolved slowly, producing still recognisable orthologues (Forterre, 1999). He proposes instead a new hypothesis based on non-orthologous gene displacement of the bacterial repli- cative proteins by foreign analogues. Non-orthologous gene displacement refers to the cases in which non-ortho- logous (unrelated or distantly related) proteins are fully responsible for an essential function in different species (Koonin et al ., 1996). Forterre further suggests that plas- mids and viruses were the donors of the non-orthologous genes that displaced the bacterial ones, as many of them Molecular Microbiology (2000) 35(1), 1–5 Q 2000 Blackwell Science Ltd Received 12 August, 1999; revised 23 September, 1999; accepted 29 September, 1999. E-mail [email protected]; Tel. (33) 1691 56805; Fax (33) 1691 56803.

Transcript of Multiple independent horizontal transfers of informational genes from bacteria to plasmids and...

Page 1: Multiple independent horizontal transfers of informational genes from bacteria to plasmids and phages: implications for the origin of bacterial replication machinery

MicroOpinion

Multiple independent horizontal transfers ofinformational genes from bacteria to plasmids andphages: implications for the origin of bacterialreplication machinery

David Moreira

UPRES-A 8080, Equipe PhylogeÂnie et Evolution

MoleÂculaires, BaÃtiment 444, Universite Paris-Sud,

91405 Orsay Cedex, France.

Summary

In contrast to the universality of other central genetic

mechanisms, the replication machinery of Bacteria is

clearly different from those of Archaea and Eukary-

otes. A large number of bacterial genes involved in

DNA replication can also be found in plasmids and

phages. Based on this, it has been recently proposed

that the ancestral bacterial genes were displaced by

non-orthologous replication genes from plasmids

and phages, which would explain the profound differ-

ence between Bacteria and the other domains of life.

The alternative hypothesis is that these DNA replica-

tion genes have been frequently transferred from bac-

terial hosts to the genomes of their plasmids and

phages. The phylogenetic analysis of the bacterial

DNA replication proteins most abundant in databases

(replicative helicase DnaB, single-strand binding pro-

tein Ssb and topoisomerase TopB) presented here sup-

ports the latter hypothesis. Each protein tree shows

that sequences from plasmids and phages branch

close to their bacterial-speci®c hosts, suggesting

multiple independent horizontal transfers. Therefore,

there is no evidence so far for non-orthologous gene

displacement of these genes.

Introduction

An intriguing observation puzzles evolutionary biologists,

in particular after the completion of the genome sequences

from a variety of organisms. Bacteria are endowed with a

DNA replication machinery clearly different from Archaea

and Eukaryotes. Indeed, whereas the translation apparatus

and a few central components of the transcription system

are basically universal, most proteins involved in bacterial

DNA replication seem to lack archaeal and eukaryotic

homologues (Olsen and Woese, 1996). This is also the

case for several bacterial proteins related to recombina-

tion and repair (Olsen and Woese, 1997; Aravind et al.,

1999). To date, two alternative hypotheses had been pro-

posed to explain the striking disparity. One was that the

last common ancestor of all extant organisms (or cenances-

tor) possessed an RNA genome, and that DNA appeared

independently in the two branches stemming from it (Bac-

teria and Archaea/Eukaryotes). Accordingly, replication

and related machineries would also have independent

and non-homologous origins (Mushegian and Koonin,

1996). However, the similarities between domains concern-

ing many other DNA-related proteins, although limited,

are probably suf®cient to support the hypothesis of a

DNA-based cenancestor (Olsen and Woese, 1996). The

alternative hypothesis envisaged that the DNA replication

machinery was poorly developed at the cenancestral stage.

Therefore, no central components would have been univer-

sally conserved, or their sequences would have diverged

to such an extent that their homology is at present imper-

ceptible (Olsen and Woese, 1996). Forterre has recently

criticized this view, arguing that there is no evident reason

to explain why, despite their involvement in common mech-

anisms, some proteins have evolved very rapidly (becom-

ing apparently unrelated) whereas others have evolved

slowly, producing still recognisable orthologues (Forterre,

1999). He proposes instead a new hypothesis based on

non-orthologous gene displacement of the bacterial repli-

cative proteins by foreign analogues. Non-orthologous

gene displacement refers to the cases in which non-ortho-

logous (unrelated or distantly related) proteins are fully

responsible for an essential function in different species

(Koonin et al., 1996). Forterre further suggests that plas-

mids and viruses were the donors of the non-orthologous

genes that displaced the bacterial ones, as many of them

Molecular Microbiology (2000) 35(1), 1±5

Q 2000 Blackwell Science Ltd

Received 12 August, 1999; revised 23 September, 1999; accepted 29September, 1999. E-mail [email protected]; Tel. (�33)1691 56805; Fax (�33) 1691 56803.

Page 2: Multiple independent horizontal transfers of informational genes from bacteria to plasmids and phages: implications for the origin of bacterial replication machinery

carry a plentiful diversity of genes involved in replication,

transcription, recombination or repair (for examples, see

Forterre, 1999). In some cases, these genes are speci®-

cally related to the homologues from the particular host

of the plasmid or virus but, in others, this speci®c relation-

ship seems not to exist. In such cases, Forterre proposes

that those plasmid/virus genes could have originated

independently or diverged very early in evolution or very

rapidly. As the diversity of these genes is larger than

those of bacteria, he concludes that the preferred direction

of displacement should have been plasmid/virus to bac-

teria. He provides some examples of bacterial replication

proteins with possible plasmidic or viral origin, such as

DnaB, the family DnaC/DnaI/DnaA, DnaG, DnaE, DnaD,

Ssb and TopB (see Table 1 in Forterre, 1999).

Forterre's hypothesis has an advantage over the two

others cited above. It is amenable to testing by phylogenetic

analysis. If the hypothesis is correct, the phylogenetic ana-

lysis of each protein of putative plasmidic or viral origin

should yield a tree in which the bacterial sequences form

a monophyletic group branching off from the plasmidic or

viral donor sequence. Interestingly, there is a bona ®de

case of non-orthologous gene displacement affecting a

large group of organisms, which may be used as a model

for this testing: mitochondrial and chloroplastic RNA poly-

merases. All but one known mitochondria possess a bac-

teriophage T3/T7-type single-subunit RNA polymerase

(encoded also by several mitochondrial linear plasmids),

instead of a classical bacterial-type multiple-subunit RNA

polymerase. Chloroplasts possess both T3/T7- and bacter-

ial-type polymerases (Gray and Lang, 1998). The bacterial

ancestor of mitochondria acquired the RNA polymerase

from a bacteriophage or a linear plasmid that displaced

the host's enzyme. A phylogenetic tree of the chloroplast,

mitochondrial, bacteriophage and plasmidic sequences

displays the expected situation (Fig. 1). Mitochondrial sequ-

ences form a clear monophyletic group, with chloroplast

sequences branching off as a sister group of plant mito-

chondria (indicating that chloroplasts acquired the bac-

teriophage-type polymerase by a secondary transfer from

mitochondria). Bacteriophage and plasmid sequences

form two independent groups and therefore it is not poss-

ible to decide whether mitochondrial RNA polymerases

have a bacteriophage or plasmidic origin based only on

the present available sequences. If bacteriophage or

plasmid sequences were intermixed with those of orga-

nelles, multiple horizontal transfers could be inferred.

Phylogenetic analysis of bacterial replication

proteins

I have analysed in a similar way all the bacterial replication

proteins for which Forterre proposes a possible plasmidic

or viral origin (see above). I present here those produc-

ing the most signi®cant results, namely, those having the

largest number of available sequences, including a variety

of bacteria, plasmids and phages, although the other genes

yielded similar results (not shown). These are the replicative

helicase (DnaB), the single-strand binding protein (Ssb)

and the topoisomerase III (TopB) (Fig. 1). In all cases, a

preliminary phylogenetic analysis, carried out using conven-

tional methods of phylogenetic reconstruction (neighbour

joining and maximum parsimony), revealed the existence

of strong differences of evolutionary rate among sequences.

Phage and plasmid sequences were especially affected,

showing in most cases very long branches (data not

shown). This represents an important problem for phylo-

genetic reconstruction, because long branches tend to be

misplaced and to group together in trees (the long branch

attraction artefact, described by Felsenstein, 1978). In addi-

tion, sequences also exhibited levels of among-site rate var-

iation (i.e. differences of variation rate for the different

sequence positions) that could produce incorrect phylo-

genetic trees (Yang, 1996). In fact, the a parameter values

of the gamma law, estimated using the program PUZZLE,

version 4.0.2 (Strimmer and von Haeseler, 1996), were

1.13, 3.86 and 1.14 for the DnaB, Ssb and TopB data

sets respectively. (Note that a may vary between 0, for

data sets with intense among-site rate variation, and in®n-

ity, for data sets in which all sites evolve at the same rate).

Again, phage and plasmid sequences contributed espe-

cially to this problem, as data sets excluding them showed

higher a values (1.83, 9.43 and 1.92 for bacterial DnaB,

Ssb and TopB respectively). Consequently, the plasmid

and phage sequences were clearly a potential cause of

artefacts for classical phylogenetic reconstruction methods.

I opted, therefore, for a more reliable approach; maximum

likelihood (the less sensitive to long branch attraction) with

a gamma law (to correct for among-site rate variation),

using the program PUZZLE 4.0.2. When very close species

Q 2000 Blackwell Science Ltd, Molecular Microbiology, 35, 1±5

Fig. 1. Maximum likelihood phylogenetic trees for T3/T7-type RNA polymerases, DnaB, Ssb and TopB sequences. Bold branchescorrespond to phage (,) and plasmid (X) sequences. Names in brackets indicate the speci®c hosts of the respective phages, viruses orplasmids. Species names marked with `cp' and `mt' correspond to chloroplast and mitochondrial sequences respectively. Trees wereconstructed using the program PUZZLE version 4.0.2, with the JTT model of substitution and 16 gamma rate categories (the programdescription is available at http:/ /members.tripod.de/korbi/puzzle/manual.html). Only unambiguous aligned regions of the proteins were used,and constant positions were removed (alignments are available on request). All trees are unrooted, except the TopB tree, rooted usingeukaryotic and archaeal homologous sequences. Bars correspond to 10 amino acid substitutions per site.

2 D. Moreira

Page 3: Multiple independent horizontal transfers of informational genes from bacteria to plasmids and phages: implications for the origin of bacterial replication machinery

Q 2000 Blackwell Science Ltd, Molecular Microbiology, 35, 1±5

Horizontal transfer of informational genes 3

Page 4: Multiple independent horizontal transfers of informational genes from bacteria to plasmids and phages: implications for the origin of bacterial replication machinery

were present (e.g. Escherichia coli and Salmonella typhi-

murium ), only one representative was kept to accelerate

the time-consuming maximum likelihood calculations.

For DnaB and Ssb, databases already contained an

abundant number of sequences, including, together with

those from bacteria, several from phages, plasmids and

organelles (chloroplasts or mitochondria). The phylogenetic

trees obtained using these sequences show a picture com-

pletely different from that derived from the RNA polymerase

data set (Fig. 1). Bacterial sequences do not form a mono-

phyletic group, but are intermixed with plasmid and phage

sequences. Furthermore, plasmids and phages branch

close to their speci®c hosts. This is so not only in cases

for which there is an evident sequence similarity (e.g. the

DnaB of E. coli and its phage P1), but even for very rapidly

evolving plasmid and phage sequences. For instance, the

rapidly evolving DnaB sequences of the plasmid lp28-2

of Borrelia burgdorferi, of diverse Chlamydia sp. plasmids

or those of the bacteriophages SPP1 of Bacillus subtilis

and MAV1 of Mycoplasma arthritidis cluster with their

respective hosts. Notably, even the extremely rapidly evol-

ving sequences of the coliphages HK022, T3 and T7

branch close to the g-Proteobacteria (E. coli and Haemo-

philus in¯uenzae ; Fig. 1). The reliability of the results is

supported by the position of the organelle sequences.

As expected, the chloroplast DnaB sequences group

speci®cally with the cyanobacterium Synechocystis sp.,

whereas the mitochondrial Ssb sequences branch close

to the a-Proteobacteria (Brucella abortus, Rhodobacter

sphaeroides, Rickettsia prowazekii, and Sphingomonas

aromaticivorans ).

The situation was, a priori, different for the bacterial

topoisomerase TopB, given that it was only known in an

apparently very restricted number of bacteria (four species:

the g-proteobacteria E. coli and H. in¯uenzae and the

Gram-positive Bacillus ®rmus and B. subtilis ), but had

many homologues encoded by plasmids (interestingly,

some of them hosted by g-proteobacteria or by Bacillus

species). Forterre considered this disproportion as addi-

tional evidence for a non-orthologous gene displacement

case. However, a thorough BLAST search of all available

databases (including the NCBI at http:/ /www.ncbi.nlm.

nih.gov/ and those containing un®nished genome sequ-

ence data: TIGR at http:/ /www.tigr.org/; Sanger Center

at http:/ /www.sanger.ac.uk/; Genome Therapeutics at

http:/ /www.cric.com/; Washington University at http:/ /

genome.wustl.edu/; and University of Minnesota at http:/ /

www.cbc.umn.edu/) allowed the identi®cation of 20 new

complete bacterial TopB homologues. They were present

in the genomes of b-Proteobacteria (Bordetella bronchi-

septica and B. pertussis ), g-Proteobacteria (Klebsiella

pneumoniae, Pasteurella multicocida, Pseudomonas

putida, Salmonella typhimurium, Shewanella putrefa-

ciens, Vibrio cholerae, Yersinia pestis, and two different

copies in Salmonella typhi ), Cytophagales (Porphyro-

monas gingivalis ) and Gram-positives (Enterococcus fae-

calis, Staphylococcus aureus, two copies in Clostridium

acetobutilicum and three copies in Clostridium dif®cile ).

In addition, partial sequences can be identi®ed in the

a-proteobacterium Caulobacter crescentus and in the

g-proteobacterium Thiobacillus ferrooxidans. Therefore,

TopB seems to be much more widespread than previously

thought, and its apparently restricted distribution is no

longer a sound argument for its origin through non-ortho-

logous gene displacement. The phylogenetic analysis of

the previously available sequences, together with a large

sample of the new sequences, yielded a tree topology

similar to those of the DnaB and Ssb sequences (Fig. 1).

Once again, plasmid sequences appeared intermixed

with bacterial ones. In some cases, there was a clear group-

ing of the plasmid sequence with its speci®c host or a close

relative. For instance, the plasmids rp4 and R751 branched

within the g-Proteobacteria, whereas the plasmid pX01 was

the closest relative of B. subtilis. Even the very rapidly

evolving plasmids pMRC01, pDB101, pAMb1 and pSK41

branched within the Gram-positives. However, in this case

there was not a perfect correlation with their hosts, although

evidence suggests that this is a result of very recent hori-

zontal transfers of these plasmids among different Gram-

positive species. For instance, the TopB sequences of

the plasmids pDB101 and pAMb1 are almost identical

(94% identity), although their hosts (Streptococcus pyo-

genes and E. faecalis respectively) are not so closely

related. Clearly, one of these plasmids has very recently

acquired this gene.

Conclusions

In the three cases described above, the phylogenetic

analyses do not lend any evidence to the putative non-

orthologous displacement of bacterial replicative genes

by plasmid or phage genes. On the contrary, the trees

obtained support the current idea that multiple horizontal

transfers of genes from bacteria to their speci®c plasmids

or phages have occurred. Thus, bacteria to plasmid/

phage transfer of a variety of informational genes seems

to be a very common phenomenon. Subsequently, a sig-

ni®cant evolutionary rate acceleration of the transferred

genes generally occurs, as attested by the long branches

they usually exhibit (Fig. 1). This is in agreement with the

observation that phages normally have faster evolutionary

rates than their cellular hosts (Drake et al., 1998).

Also, viruses that infect eukaryotic cells carry a diversity

of homologues of eukaryotic-type informational proteins.

These include type IB topoisomerase, primase, DNA ligase,

subunits of RNA polymerase II, transcription factor SII,

translation initiation factor 2a, RNase H1, mRNA capping

enzymes, 26S protease, HMG non-histonal chromosomal

Q 2000 Blackwell Science Ltd, Molecular Microbiology, 35, 1±5

4 D. Moreira

Page 5: Multiple independent horizontal transfers of informational genes from bacteria to plasmids and phages: implications for the origin of bacterial replication machinery

proteins, uracil-DNA glycosylase, dUTPase and ribonucleo-

tide reductase (data not shown). A non-orthologous eukary-

otic gene displacement by genes of viral origin could also

be advanced. However, as in the cases studied here for

bacteria, the eukaryotic-type genes present in viruses

appear to be the result of horizontal transfers from their

particular hosts (data not shown).

In conclusion, the evolutionary enigma concerning the

deep difference between the bacterial and the archaeal/

eukaryotic replication machineries remains open. Although

non-orthologous gene displacement may explain some spe-

ci®c incongruities at a smaller evolutionary scale, there is

no current evidence for its connection with this macroevo-

lutionary problem. Nevertheless, this could change with

the discovery of putative phages or plasmids carrying true

ancestral sequences, and therefore the research on this

subject should continue. On the other hand, the hypothesis

of an early acceleration of evolutionary rate leading to the

impossibility of recognizing the homology between replica-

tive proteins of Bacteria and Archaea/Eukaryotes should

not be discarded. For instance, the eukaryotic processivity

factor, PCNA, and the bacterial Pol b have very dissimilar

amino acid sequences, but the use of crystallographic data

has allowed the recognition of their homology (Kelman and

O'Donnell, 1995). In addition, the use of new sensitive

search methods is bringing to light superfamilies of homo-

logous proteins that only a few years ago remained com-

pletely unseen (Altschul et al., 1997). This could be the

case of other replication proteins.

Acknowledgements

I thank Puri®cacioÂn LoÂpez for critical reading of the manuscriptand the Fondation des Treilles for support. Preliminary sequ-ence data obtained from The Institute for Genomic Research,Genome Therapeutics, Sanger Centre, University of Minne-sota, and University of Washington are acknowledged.

References

Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang,Z., Miller, W., and Lipman, D.J. (1997) Gapped BLAST andPSI-BLAST: a new generation of protein database search pro-grams. Nucleic Acids Res 25: 3389±3402.

Aravind, L., Walker, D.R., and Koonin, E.V. (1999) Conserveddomains in DNA repair proteins and evolution of repair sys-tems. Nucleic Acids Res 27: 1223±1242.

Drake, J.W., Charlesworth, B., Charlesworth, D., and Crow,J.F. (1998) Rates of spontaneous mutation. Genetics 148:1667±1686.

Felsenstein, J. (1978) Cases in which parsimony or compati-bility methods will be positively misleading. Syst Zool 27:401±410.

Forterre, P. (1999) Displacement of cellular proteins by func-tional analogues from plasmids or viruses could explainpuzzling phylogenies of many DNA informational proteins.Mol Microbiol 33: 457±465.

Gray, M.W., and Lang, B.F. (1998) Transcription in chloro-plasts and mitochondria: a tale of two polymerases. TrendsMicrobiol 6: 1±3.

Kelman, Z., and O'Donnell, M. (1995) Structural and func-tional similarities of prokaryotic and eukaryotic DNA poly-merase sliding clamps. Nucleic Acids Res 23: 3613±3620.

Koonin, E.V., Mushegian, A.R., and Bork, P. (1996) Non-ortho-logous gene displacement. Trends Genet 12: 334±336.

Mushegian, A.R., and Koonin, E.V. (1996) A minimal geneset for cellular life derived by comparison of completebacterial genomes. Proc Natl Acad Sci USA 93: 10268±10273.

Olsen, G.J., and Woese, C.R. (1996) Lessons from anArchaeal genome: what are we learning from Methano-coccus jannaschii? Trends Genet 12: 377±379.

Olsen, G.J., and Woese, C.R. (1997) Archaeal genomics: anoverview. Cell 89: 991±994.

Strimmer, K., and von Haeseler, A. (1996) Quartet puzzling:a quartet maximum likelihood method for reconstructingtree topologies. Mol Biol Evol 13: 964±969.

Yang, Z. (1996) Among-site rate variation and its impact onphylogenetic analyses. Trends Ecol Evol 11: 367±370.

Q 2000 Blackwell Science Ltd, Molecular Microbiology, 35, 1±5

Horizontal transfer of informational genes 5