Special Call for Papers COMPARATIVE GENOMICS Genome-wide ...
Transcript of Special Call for Papers COMPARATIVE GENOMICS Genome-wide ...
1
Special Call for Papers COMPARATIVE GENOMICS
Genome-wide Analysis of Restriction-Modification System in Unicellular and
Filamentous Cyanobacteria Fangqing Zhao1, Xiaowen Zhang1, Chengwei Liang1, jinyu Wu2, Qiyu Bao2, Song Qin1
1Institute of Oceanology, Chinese Academy of Sciences, Qingdao, China 2Institute of Biomedical Informatics, Wenzhou Medical College, Wenzhou, China
Running head
Genome-wide Analysis of RM System
Corresponding author
Song Qin ([email protected])
Institute of Oceanology
Chinese Academy of Sciences
Qingdao, 266071, China
Qiyu Bao ([email protected])
Institute of Biomedical Informatics
Wenzhou Medical College
Wenzhou, 325000, China
Articles in PresS. Physiol Genomics (December 20, 2005). doi:10.1152/physiolgenomics.00255.2005
Copyright © 2005 by the American Physiological Society.
2
Abstract
Cyanobacteria are an ancient group of Gram-negative bacteria with strong genome size
variation ranging from 1.6Mb to 9.1 Mb. Here, we firstly retrieved all the putative
restriction-modification (RM) genes in the draft genome of Spirulina, and then performed
a range of comparative and bioinformatic analyses on RM genes from unicellular and
filamentous cyanobacterial genomes. We have identified 6 gene clusters containing
putative Type I RMs, and 11 putative Type II RMs or the solitary methyltransferases.
RT-PCR analysis reveals that 6 of 18 methyltranferases are not expressed in Spirulina,
whereas one hsdM gene, with a mutated cognate hsdS, was detected to be expressed. Our
results indicate that the number of RM genes in filamentous cyanobacteria is significantly
higher than that in unicellular species, and this expansion of RM systems in filamentous
cyanobacteria may be related to their wide range of ecological tolerance. Furthermore, a
coevolutionary pattern is found between hsdM and hsdR, with a large number of site pairs
positively or negatively correlated, indicating the functional importance of these pairing
interactions between their tertiary structures. No evidence for positive selection is found
for the majority of RMs, e.g., hsdM, hsdS, hsdR, and Type II REase gene families, while
a group of MTases exhibit a remarkable signature of adaptive evolution. Sites and genes
identified here to have been under positive selection would provide targets for further
research on their structural and functional evaluations.
Keywords
Coevolution, Comparative genomics, Horizontal gene transfer, Molecular evolution,
Spirulina platensis
3
INTRODUCTION
Cyanobacteria are among the oldest life on Earth with the capacity of oxygenic
photosynthesis similar to the process found in higher plants. Cyanobacteria may be
unicellular or filamentous, and they can be found in many different environments ranging
from marine to freshwater habitats and also in soil, on rocks and on plants. Spirulina, a
nonheterocystous, filamentous cyanobacterium, is proposed as an important mariculture
crop both for food and food supplement, and as a potential chemical factory for a variety
of useful products if effective genetic engineering can be accomplished. Unlike other
unicellular (eg. Synechococcus and Synechocysits) and filamentous cyanobacteria (eg.
Anabaena), genetic transformation of Spirulina have had limited success to date (18, 52).
The predominant barrier appears to the presence of restriction-modification (RM)
systems in Spirulina platensis, which may decrease or prevent the uptake of exogenous
DNA (53).
RM systems comprise a restriction endonuclease (REase) activity that specifically
cleaves DNA and a corresponding methyltransferase (MTase) activity that specifically
methylates the host DNA, thereby protecting it from cleavage (4, 55). RM systems can be
subdivided into four groups (I, II, III, and IV) based on the subunit composition, cofactor
requirements, sequence specificity, and cleavage position (40, 55). Moreover, those
MTases not associated with REases are termed solitary MTases. RM systems in the host
genome have been viewed as a defense mechanism against DNA invasion or as the
parasitic selfish genetic elements (22, 42). The selfish gene hypothesis is now well
supported by much evidence from genome analysis and experimentation (23). The
traditional method used to screen for restriction endonucleases is to culture individual
strains and test their extracts for the ability to produce specific fragments from small
DNA molecules (55). A large number of both filamentous and unicellular cyanobacteria
have been shown to contain one or more type II site-specific REases (29). Lyra et al. (29)
employed principal component analysis to demonstrate the relationship between certain
REases and their hosts, and found that all 28 cyanobacterial strains studied contain Type
II REases, with the most common cyanobacterial REase types, AvaII, AvaI, and AsuII.
Matveyev et al. (31) used experimental and bioinformatics approaches to give a full
insight on the MTases of the cyanobacterium Anabaena sp. PCC7120, which has four
4
Type II MTases associated with REases (AvaI-IV), and five solitary MTases.
Comparative analysis of cyanobacterial MTases revealed that the solitary ones appear to
be of ancient origin in cyanobacteria, while the Type II MTases appear to have arrived by
horizontal gene transfer (31).
Now 15 cyanobacterial genomes have been fully sequenced, 3 in the draft format,
and more than 20 are in the process of being sequenced
(http://img.jgi.doe.gov/pub/main.cgi? page=restrictedMicrobes&domain=Bacteria;
http://www.ncbi.nlm.nih.gov/). These genome-sequencing projects undoubtedly bring a
great convenience to search for novel RM genes using bioinformatic tools. We have
generated more than 7Mbp genomic sequences with 8× coverage of redundancy from the
shot-gun genome-sequencing project of Spirulina platensis (Bao et al., unpublished data).
Genome-wide identification of RM systems in Spirulina will help to develop an efficient
gene transfer system in this organism. Here, we reported all the putative RM superfamily
from the draft genome of Spirulina platensis and presented a comparative genomic
analysis of RM genes in the unicellular and filamentous cyanobacteria.
MATERIALS AND METHODS
Computational search for novel RM genes
The cyanobacteria species examined included Synechocystis, Synechococcus,
Prochlorococcus, Anabaena, Nostoc, Spirulina, and Trichodesmium. An initial set of RM
genes was obtained from CyanoBase (http://www.kazusa.or.jp/cyano/cyano.html) and
IMG v.1.1 (http://img.jgi.doe.gov/v1.1/main.cgi). This dataset, including
well-characterized and putative cyanobacterial RMs, was used to construct a query
protein set. Each protein in this query dataset was used to search the potential novel
sequences in all cyanobacteria species with whole genome sequences available, by using
the BLASTP and TBLASTN programs, with e-value<1e-10; the searches were iterated
until convergence. Hits produced by different query sequences were combined and
duplicate records, which were identified on the basis of accession, were removed.
Because RM genes tend to form clusters (55), all cyanobacterial genomic clusters
containing RM sequences were also retrieved and translated into six open reading frames
and examined manually for the presence of the RM motifs in order to discover potential
5
sequences with distant homology. Then profile recognition programs (45, 46), including
PFAM (http://pfam.wustl.edu/) and SMART (http://smart.embl-heidelberg.de/), were
used to determine the presence of RM motifs, and multiple sequence alignments were
used to finally classify a protein as RM homolog. Overall, six motifs (HSDR_N, DEXDc,
Methylase_M, Methylase_S, N6_Mtase, and MethyltransfD12) were used in this study.
All the putative RM genes identified in Spirulina genome have been submitted to the
GenBank. Accession numbers are as follows: DQ239732-DQ239748.
RT-PCR analysis of RM genes in Spirulina
Total RNA was extracted from fresh Spirulina using RNeasy Mini kit (QIAGEN,
Germany) by following manufacture’s protocol. Residual DNA in RNA preparations was
eliminated by digestion with RNase-free DNase (PROMEGA, US). The absence of DNA
was checked by PCR. The purified total RNA was reverse transcribed into cDNA using
AMV reverse transcriptase (TAKARA, China). The following primers used for RT-PCR
were designed based on the coding region of each gene: I-hsdM: forward primer
CGATGCGACGGAGTTTAG, reverse primer GCTTTTGTTCCGCCTTTT; II-hsdM:
forward primer TACTCGGCACGCTGTTTT, reverse primer
CCCTGCCAACCCTTTTAG; III-hsdM: forward primer GCGTTCCTGTTTTTGCAG,
reverse primer TGGTCTTGCCCATATTGC; IV-hsdM: forward primer
GGCATTATCGGCTTACCC, reverse primer GTGGAGGTCTTCCGGTTC; IV-MTase:
forward primer GACCCCTTTGCAGGTAGC, reverse primer
TGTTCGGAATAGGGCAGA; V-hsdM: forward primer GCCTAAATCAGCCCCATC,
reverse primer GCCTCACAATCAGCTTGC; V-MTase: forward primer
ATCACAATCACCACAGAACC, reverse primer ATGTTGAGCCAAAAAGAGC;
VI-hsdR: forward primer TATCGCCTGCTGCGTTAT, reverse primer:
GCGGAGTCCTTGGGTATC; MTase1: forward primer GACGTAGCAACACACTTTC,
reverse primer GCAAATAGAAAAGCCTTGG; MTase2: forward primer
TTAACCCACGAATCAATCC, reverse primer CGAGAGACTGTAAAACCTCC;
MTase3: forward primer GGGAAAAACTTACAACCAC, reverse primer
GGCTAATAAAGGGGGAAC; MTase4: forward primer TGCCCAGTGATAGTTTTTC,
reverse primer CCAGGGCATCAATTACC; MTase5: forward primer
TCGCCTTTATCCCTATCG, reverse primer CATACCATCAGGAGGAACC; MTase6:
6
forward primer GGCCAGAAGCTGAAACC, reverse primer
GAACCCACCAACTCAAACC; MTase7: forward primer AAATTATGCCTCCAAGC,
reverse primer CCTGCCACATATTCAGC; MTase8: forward primer
CCAAACGATAACAGCACC, reverse primer CCAGCGACAGAAAAACC; MTase9:
forward primer TTATCCGCCGCCAATTACC, reverse primer
TGTCCATTTCGCACGTCG; MTase10: forward primer TTGCAGATCCTCCCTACA,
reverse primer AATTAATTTACGTTGTGCG; MTase11: forward primer
TAGCAATTCGCTTGACC, reverse primer AGCATCACCTTGACTCC.
Multiple sequence alignment and phylogenetic analysis
Multiple sequence alignments were carried out using the Clustal W program (48),
and then adjusted manually. Graphical representations of the multiple amino acid
sequence alignments, the sequence logos, were presented using WebLogo (7). To
maximize the number of sites available for analysis, partial sequences and certain
sequences with large deletions were excluded from the analysis. The neighbor-joining
method and maximum parsimony method in MEGA3 (28) were used to construct the
phylogenetic tree, and the reliability of each branch was tested by 1,000 bootstrap
replications.
Molecular evolutionary analysis
Detecting structural interactions and statistical covariance among separate amino
acid sites is of great significance for understanding protein structure and evolution (2, 38).
Such analyses are based on the assumption that functionally significant coordinated
residues in proteins result from physicochemical interactions (e.g., hydrophobic,
electrostatic). These interactions are dependent on the physicochemical properties (e.g.,
volume, charge, and hydrophobicity) of the residues (50). Here, we estimated the
pairwise correlation of residue substitutions among several motifs in hsdM and hsdR.
This approach based on the estimation of the linear correlation coefficients between the
amino acid physicochemical property values, was fully described by Afonnikov and
Kolchanov (1). Different characteristics of amino acids (volume, hydrophobicity, and
polarity) were chosen for coevolution analysis, which reflect physical and chemical
interactions between residues. Different sequence weighting methods (14, 54) were also
used in these tests to avoid sequence bias.
7
One of the most stringent methods to detect positive selection is a comparison of the
rate of nonsynonymous substitutions (dN) to the rate of synonymous substitution (dS). The
ratio between these rates (ω= dN/dS) is a reliable measure of the selective pressure acting
on a protein-coding gene, with ω=1 indicating neutral evolution, ω<1 purifying selection,
and ω>1 positive selection (58). Likelihood ratio tests (LRTs) were performed using
Codeml with the site-specific models (M1/M2, M7/M8) of the PAML package (57).
When parameter estimates under a model allowing positive selection suggest the presence
of a number of sites with ω>1, Bayes theorem was used to calculate their posterior
probabilities (56).
RESULTS
The RM superfamily in Spirulina
In order to identify all members of the RM superfamily, we performed TBLASTN
searches of the Spirulina in-finishing genome database with RM genes for the bacterial
and archaebacterial homologs. Similar searches were run with the Pfam domains
(HSDR_N, DEXDc, Methylase_M, N6_Mtase, Methylase_S, and MethyltransfD12) of
the RM proteins to avoid the exclusion of highly diverged sequences with just limited
conserved motifs. PFAM and SMART domain analyses with the derived sequences
employed as queries were then carried out to eliminate false positives. After above
analyses, a total of 30 putative RM genes were obtained from the Spirulina genome
database (Fig.1, Supplementary Table 1). Type I RM systems comprise three closely
linked subunits, which can distinguish them obviously from other type of RM genes
without further experimental research. In this study, the symbol for Type I RM system
was hsd, thus the genes were hsdR, hsdM, and hsdS, whereas other types of RM genes or
the solitary MTases were uniformly termed as REase or MTase.
Six clusters of the Spirulina genome, ranging from 4.7kb to 11.7kb, were found
containing the putative Type I RM systems (Fig. 1). In cluster I, hsdM (688aa) and hsdS
(392aa) should be co-transcribed from the same promoter, with four nucleotide overlap.
hsdR gene was in the downstream of the hsdM/S gene partitioned by two ORFs (putative
ATP-binding protein and orf1). In cluster II, hsdS subunit was interrupted by a stop
codon (DQ239733, 2642-TAAATCGTG) in the coding frame, which resulted in two
8
separate but frame-integrated specificity domains (target recognition domains, TRDs). In
the downstream of this hsdS gene, a pin homolog encoding a site-specific recombinase
was found on the opposite strand, which could mediate DNA inversion and phase
variation in prokaryotes (10, 51). Two RM loci were found in cluster IV: one was the
putative Type I hsdM/S genes, the other was composed of two ORFs encoding the m4C
type of methyltransferase (IV-MTase) and DUF820. DUF820 sequences form a family of
hypothetical proteins from bacteria that have greatly expanded in a number of
cyanobacterial species. Kinch et al. (19) identified this family should belong to a novel
REase with the conserved motifs that make up the nuclease active site. Hence, this ORF
should be the cognate REase of IV-MTase, although it showed little significant
resemblance to any known Type II REases. All the HsdR and HsdM identified in the
Spirulina genome shared high similarity with the RM proteins in other prokaryotic
organisms. HsdS, however, shared less sequence similarity (5%-33%) from their
counterparts in cyanobacteria. Some of RM genes in Spirulina should be defective, as
judged by the nonsense mutation (e.g. II-hsdS) or the loss of conserved motifs (e.g.
V-hsdM, MTase11).
We also searched the potential type II or solitary MTases containing conserved
MTase sequence motifs (Pfam: MethyltransfD12) and the adjacent putative REases.
Since the sequences of type II MTases possess a moderate level of sequence conservation,
most of them should be identified with the computational searches. In contrast, REases
only share sequence similarity when they have similar recognition specificities (43).
Therefore, only four MTases could be identified to possess cognate REases in the
Spirulina genome (Fig.1), and whether other closely linked ORFs function as the
endonucleases can be clarified by experimental approach. Four cognate REases (REase1,
2, 6, and 7) share significant sequence similarities with known AvaIVAP (GCTNAGC, p
value=8E-37), AgeI (ACCGGT, p value= 2E-81), BsuBI (CTGCAG, p value= 2E-50),
and HindVP (GRCGYC, p value= 1E-82), respectively. These MTases can be annotated
into three groups: m4C (IV-MTase, V-MTase), m5C (MTase1-9), and m6A (MTase10,
11) based on querying the REBASE (41).
RT-PCR analysis of RM genes in Spirulina
Since gene expression patterns can provide important clues for gene function, we
9
employed RT-PCR experiments to characterize gene expression profiles of the Spirulina
methyltransferases. Previous studies revealed that many of RM systems in Anabaena sp.
7120 or Helicobacter pylori J99 were partially or completely inactive (26, 31). Similar
results were also found in the RM systems of Spirulina platensis. As shown in Fig.2, 6 of
18 methyltransferase genes were not expressed in Spirulina. Notably, the hsdM gene in
II-hsd system was expressed, whereas its cognate hsdS gene was defective, with a stop
codon in the coding frame resulting in two split TRD domains. Most of the HsdS subunits
carry two separate TRDs, providing the symmetry for their interaction with the bipartite,
asymmetric DNA target (21). Such mutation in hsdS may cause the inactivation of the
RM system. However, MacWilliams and Bickle (30) found that a truncated HsdS subunit
comprising a single TRD and a set of conserved motifs could also function as a dimer,
specifying the bipartite, symmetric DNA target. Moreover, HsdS subunits can be
exchanged between two families of Type I RMs (44). In some cases, HsdM seems to be
functional with a defective cognate HsdR, which may serve to protect the genome from
the attack by RM systems (15, 47). The expression of II-hsdM indicated that frameshift or
missense mutations in one subunit could not always lead to the inactivation of the whole
RM system. Therefore, this points out the need for a cautious approach in evaluating the
activity of each subunit in RM systems.
Genomic distribution and sequence analysis of RM genes
Computer-based nucleotide and protein searches were performed using different
BLAST search programs at NCBI (http://www.ncbi.nlm.nih.gov/BLAST) and JGI
(http://img.jgi.doe.gov/v1.1/main.cgi?page=geneSearchForm). Protein sequences of
RM genes previously described in REBASE were used as queries for database. All the
putative RMs were listed in the supplementary Table 1. To elucidate the complete
genomic structure of the RM genes, we mapped them onto the unicellular and
filamentous cyanobacterial genomes (Fig. 3). As the Spirulina platensis genomic
sequences were still in the draft format, all the putative RMs were then mapped onto the
chromosome evenly, which do not represent their actual locations.
Sequences giving the better reciprocal BLAST hits were tentatively assumed to
identify homologous counterparts in two species if they could be aligned over at least the
BLAST-Score > 90 and the E-value < 1E-10. By these criteria, all the RM domains
10
derived from the genome of Spirulina platensis, Anabaena sp. 7120, and Anabaena
variabilis ATCC 29413 were examined to have quite different sequence similarities.
Some MTases (MTase1-3) in Anabaena variabilis ATCC 29413 had more homologs with
high similarities in the other two filamentous cyanobacteria, while other MTases have
less or no counterparts in Spirulina and Anabaena sp. 7120. Moreover, no obvious
similarities (usually less than 20%) were found among the specificity genes (HsdS),
which encode polypeptide sequences responsible for the recognition of the target
sequence. A striking example of evolutionary conservation is provided by the genes
encoding Type I RMs in Anabaena sp. 7120 (hsd2 and hsd3) and Anabaena variabilis
ATCC 29413 (hsd2). The amino sequences of these gene products (HsdR and HsdM) are
nearly identical in Anabaena sp. 7120; 96% identity was preserved in HsdR/M between
Anabaena sp. 7120 and Anabaena variabilis ATCC 29413. Interestingly, all three Type I
RMs in Anabaena variabilis ATCC 29413 shared some similarities with the HsdR/M in
Anabaena sp. 7120, whereas the Type I RMs in Spirulina exhibited much less sequence
similarity with those in the Anabaena species.
Phylogenetic analysis
RMs identified from cyanobacteria and other prokaryotic organisms, such as
proteobacteria, archaea, chlorobi, firmicutes, spirochaetes, and actinobacteria, were used
to construct phylogenies to discern the evolutionary history of RM system. The 16s
rDNA phylogeny clearly showed separation of prokaryotes into several monophyletic
clades, and all these branches had significant statistical support (Fig. 5A). The hsdM and
hsdR phylogenies illustrated in Fig. 4, however, exhibited quite different topologies,
where different sources of hsd genes scattered throughout the tree. Although several
monophyletic groups were inferred with strong bootstrap support, these groups did not
correspond to their 16s rDNA phylogeny. Further, in the filamentous species of
cyanobacteria, the hsd genes were usually present in several copies, and based on the tree
topology and their genomic organization these copies might have quite different
evolutionary histories. For example, hsd2 and hsd3 of Anabaena sp. 7120 were adjacent
to each other along the chromosome, and were nearly identical at the amino acids level.
These two copies most likely have evolved through recent duplication, whereas evolution
of other RMs (hsd1, hsd4 and hsd5) is more complex and probably involves horizontal
11
gene transfers.
Unlike the wide dispersed distribution of hsd genes through the phylogeny, most of
the cyanobacterial MTases in Fig. 5B formed a separate clade (77% bootstrap value), with
the exception of six sequences (A7-MTase1, A7-MTase3, Se-MTase1, Se-MTase2,
S6-MTase2, and S6-MTase3). The archaebacterial sequences were located elsewhere on
the tree. Most of the γ-proteobacterial sequences formed a separate clade with 99%
bootstrap support. The exception was the sequences from Moraxella bovis
(PG-BAA03071) and Chromohalobacter salexigens (PG-ZP_00473324) that grouped
with other prokaryotes. Moreover, two viral MTases were clustered with four
β-proteobacterial sequences (100% bootstrap value), in which Burkholderia
(ZP_00500806) was the host of the bacteriophage phiE125 (YP_164085). The MTases in
both genomes shared a great similarity (>99%), indicating the horizontal transfer of
MTases between the bacterial host and the phage. These results evidently suggested
multiple lateral transfers of RM genes both within and between the major clades of the
tree (Fig. 4 and Fig. 5).
We also performed phylogenetic analysis separately for each Pfam domain of HsdR
and HsdM (HSDR_N, DEXDc, Methylase_M, and N6_Mtase), and phylogenies obtained
from these different domains were quite similar to the phylogenetic trees illustrated in
Fig.4 (Data not shown). By contrast, phylogenetic analyses using different domains of
HsdS gave different phylogenetic topologies with no strong bootstrap support (Data not
shown). HsdS usually contains two Pfam domain (Methylase_S) repeats, and little
similarity exists outside of these motifs. The diversification of hsdS genes to achieve
maximal variation of specificity should be driven by strong selective pressures (55).
Evolutionary analysis of RM genes
HsdM and HsdR, as with interacting domains, must co-evolve, to form an integral
and active complex with HsdS subunit. Phylogenetic trees (Fig.4) based on NJ method
showed extensively similarity in clustering of cognate hsdM and hsdR genes, also
indicating the coevolution of two subunits of Type I RMs. Previous studies revealed
several conserved motifs in HsdM, among which motif I (D/E/SXFXGXG) can bind the
AdoMet, and motif IV (N/DPPF/Y/W) can be involved in substrate binding or in the
catalytic activity (34). As for HsdR, several conserved motifs were also found in the
12
DEAD-box domain, which formed a NTP-binding pocket and a portion of nucleic
acid-binding site (13, 34).
We estimated the pairwise correlation of five domains containing above functional
motifs. Different sequence weighting methods (14, 54) used to avoid sequence bias gave
similar results, and only the results derived from the Vingron and Argos (54) approach
were shown. We firstly considered such amino acids property as volume (size) for
coevolution analysis. From Fig. 6, we can see that many site pairs are highly correlated in
a positive or negative way at the 99.9% level. In hsdM-motif I, seven sites (7-11, 13, 15)
were correlated with the sites in the other hsdR-motifs, and these sites mostly resided in
or were surrounding the featured residues. Similar results were found in the hsdM-motif
IV. Sites close in the primary sequence are likely to be close in the tertiary structure, and
are therefore likely to have coevolved. In this study, however, the observed numbers of
positively or negatively coevolving site pairs within the distant sites were significantly
higher than those close pairs, indicating that the amino acids size interactions, especially
those distant site pairs, should play a important role in the tertiary structure formation.
Another amino acids characteristic, hydrophilicity, was also used to perform the
coevolution analyses. From a structural and functional perspective, correlated amino acid
replacements may have occurred among sites to maintain the same hydrophilicity
properties, size constraints, electrostatic charges etc. Many pairs of residues showed high
correlations, and nearly 1/3 of the coevolving site pairs (70 of 212) were identical to
those computed for the residues size-effect analysis (Fig.6). The hydrophilicity effect,
however, was different with the size correlations in the hsdM-motif I, where much fewer
coevolving site pairs were found in this domain. Nearly all the sites in the hsdR-motif I,
Ia, II, III exhibited strong positive or negative correlations, revealing the great
significance of these sites to form a hydrophobic hydrolysis pocket.
To analyze the possibility of positive selection acting on specific codons in the RM
family, likelihood ratio test (LRT) was used to test for variation in the dN/dS ratio among
sites in sets of closely related RM genes. However, most of them were highly divergent,
and it was problematic to align such cases with confidence. For these reasons, we
performed most of our analyses with local clades of 10-15 sequences, which could keep
genetic divergence reasonably small and keep alignments robust for LRT analysis. In the
13
hsdM, hsdS, hsdR, and Type II REase gene families, all site-specific models provided
little or no evidence for positive selection (Data not shown).
In contrast, with the MTase gene family we obtained results indicating positive
selection, and it was estimated that a total of 10.2% sites were likely to be positively
selected (ω=31.1, p<0.001). Twelve highly homologous MTases (Sp-Mtase1, Sp-Mtase4,
Av-Mtase2, A7-Mtase5, A7-Mtase10, A7-Mtase8, S6-Mtase1, Np-402028720,
AAA50432, YP_199246, BAC81824, ZP_00517809), which could keep the alignment
robust for LRT analysis, were used to construct phylogenetic tree. Codon usage bias is
known to affect the estimation of synonymous and nonsynonymous substitution rates (59).
Thus two different codon usage bias models were used in all our analyses: F3×4 and F61.
The results under these two models were quite similar, and those under F3×4 were
presented only. To examine the relationship of sites under positive selection to the protein
structure, we mapped these sites onto the MTase tertiary structure (Fig.7). Firstly, the
sequence alignment of MTase protein sequences (m5C family) previously used for
PAML analysis, was submitted to the protein modeling server: SWISS-MODEL
(http://swissmodel.expasy.org//SWISS-MODEL.html) with PDB-1dctA (39) as the
modeling template. The derived structure using Av-MTase2 as the target sequence was
shown in Fig.7A. Several positively selected residues (482S, 484M, 494G, 507S, 508S)
were located at the MTase C-terminal extension of approximately 50 aa, where no
coordinate residues were found in the template (1dctA), and thus they could not be
mapped onto the tertiary structure. Although the other predicted positively selected sites
appeared to be relatively uniformly distributed along the protein sequence, a clear pattern
was revealed when the sites were examined in relation to the tertiary structure: they were
all exposed amino acids, and were clustered on one face of the dimer towards the nucleic
acids (Fig.7B). Two sites were located in the DNA binding pocket: one was site 376A at
the scaffold region of the small domain; another site 197T was located at one of the DNA
backbone-binding motifs adjacent to the active site (39). Thus, these sites were available
to participate in DNA-protein interactions. Moreover, other sites were located at the helix
D (206Q, 213E, 217N) or the coil linking β3 and β4 (126S, 131I).
DISCUSSIONS
14
To date, more than 100 cyanobacterial strains are known to produce restriction
endonucleases (41). Generally, only one or two restriction endonucleases were found in
each strain of cyanobacteria through conventional methods to test the crude cell extracts
for their restriction activity (29, 37). The main reason is the difficulty of detecting type I
RMs or solitary DNA methyltransferases by biochemical or genetic assay (34, 41).
Genome-wide screening of RM systems based on the genome-sequencing project,
however, leads to a virtual explosion in the number of putative RM genes, which provides
us a new and comprehensive insight into the cyanobacterial RM systems.
In this study, we have identified and analyzed a large number of putative RM genes
from all the finished or ongoing cyanobacterial genomes. From Fig.2 and supplementary
Table 1, we can see that more RM genes are found in filamentous cyanobacteria
(Anabaena, Spirulina, Nostoc) rather than in unicellular species (Synechocystis,
Synechococcus, Prochlorococcus). We also found that the bacterial or archaebacterial
genomes in REBASE with tremendously large number of RM genes are usually
pathogenic bacteria (e.g. Campylobacter jejuni, Helicobacter pylori, Neisseria
gonorrhoeae) or extremophiles (e.g. Chloroflexus aurantiacus, Methanocaldococcus
jannaschii). In spite of a wide variation in RM contents between closely related species,
these pathogenic bacteria or extremophiles have a significantly higher number of RMs
than the other species (T-test, p<0.001). It is considered that such disproportionate
expansion of certain gene families allows these organisms to adapt to a wider range of
environmental conditions, or exploit a greater variety of nutrient sources (25). When we
ruled out the chromosome size effect and calculated the relative number of
defense-related genes out of the putative COGs in cyanobacteria, it is still significantly
greater than that in unicellular species, and similar results were also found in the
cyanobacterial signal transduction systems (p<0.001, data not shown). Unicellular
Prochlorococcus, living in the oligotrophic open ocean, possesses no or a limited number
of RM systems, as well as other genes involved in environmental sensing and regulatory
systems. Dufresne et al. (9) assumed that the major driving force behind genome
reduction within the Prochlorococcus radiation has been a selective process favoring the
adaptation of this organism to its environment. Most of filamentous cyanobacteria exhibit
a wide range of ecological tolerance and are found in freshwater, marine and terrestrial
15
habitats. A good example is Nostoc punctiforme, a filamentous nitrogen-fixing
cyanobacterium, which displays an extraordinarily wide range of vegetative cell
developmental alternatives, physiological properties and ecological niches, including
broad symbiotic competences with plants and fungi (32). The identification of a new
family of putative PD-(D/E)XK nucleases in various cyanobacteria also indicated that the
expansion of such nuclease family should be correlated with their host’s ecological
plasticity (11).
Nobusato et al. (36) analyzed all the RM genes identified from the complete genome
sequences of two Helicobacter pylori strains, and found that these genes were scattered
among diverse groups of bacteria in the phylogeny. Horizontal gene transfer may have a
considerable influence on the widespread distribution of RM systems among bacteria (23).
Similar results were found in the phylogenies of cyanobacterial RMs. However, most of
the Type II or the solitary MTases in cyanobacterial genomes seemed to form a separate
clade, in contrast with the wide dispersed distribution of Type I RMs in the phylogeny.
Matveyev et al. (31) considered the solitary MTases to be the ancient orgin within
cyanobacteria. Here, the MTases that fell out of the cyanobacterial clade were all solitary
ones (Fig.5), which shared higher similarities with their homologs in proteobacteria or
firmicutes, whereas the solitary MTases in Anabaena variabilis ATCC 29413 and
Spirulina platensis were all tightly grouped with the Type II MTases, indicating the
horizontal gene transfer of the solitary MTases between cyanobacteria and other bacteria.
In Spirulina, most of the MTases were identified to be orphan genes with the absence of
their cognate REases in the closely linked loci. One explanation may be a failure
detection of the REase gene because of its high divergence or pseudogenization. The
other concern was that RM genes were sequentially transferred into the host genome,
with the modification gene being transferred first. These solitary genes may acquire a
unique function compared with other MTases in RM systems (23, 47). Most of the Type I
RM systems in cyanobacteria, however, seemed to be transferred en bloc. Three subunits
of Type I RM system, hsdM, hsdS, and hsdR, are usually closely linked. As shown in
Fig.4 and Fig.6, the similar phylogenies of HsdM and HsdR and several highly coevolved
motifs between two subunits suggest that en bloc lateral transfer of such linked genes and
cooperative evolution should be more common in the Type I RM systems. After the host
16
acquires the novel RM genes, sequence permutation or gene duplication seems to be the
major mechanism in the diversification of RM systems (5, 16).
In Type I RM system, HsdR, HsdM and HsdS form a complex (R2M2S1) to perform
both endonuclease and methyltransferase activities (4, 34, 55). In comparison with the
extensive literatures describing the conserved motifs in HsdM and HsdR and elucidating
their functions, covariation analyses of both subunits are rare. Previous studies identified
seven conserved motifs in the DEAD-Box of HsdR, and all these motifs are involved in
the ATP binding/hydrolysis reaction (13). Here, we applied the CRASP method (1) to
estimate correlation coefficients of six conserved domains in hsdM and hsdR proteins,
and found a significant (at the 99.9% level) pairwise correlation among these domains. In
hsdR subunit, motif I and II are directly involved in catalyzing the NTP hydrolysis
reaction; motifs Ia interacts with the sugar-phosphate backbone, whereas motif III is
involved in hydrogen bond and stacking interactions with the DNA bases (13). Korolev et
al. (27) found that residues in motifs Ia and III are in close proximity to a loop containing
the aspartate and glutamate residues in motif II, suggesting a potential role for these
regions in the transduction of energy from the ATPase site to the DNA molecule.
Site-directed mutations in these motifs usually result in greatly decreased ATP hydrolysis
and eliminated helicase activity (8, 13). Here, we identified that a large number of site
pairs in these motifs are positively or negatively correlated, indicating these residues
should function in forming a flexible hydrophobic hydrolysis pocket. It appears likely
that subtle changes in ATP or DNA binding characteristics might arise through minor
conformational changes associated with the structural interactions among these motifs.
However, further experiments are required to examine such amino acid changes for their
functional implications.
RM genes seem to be frequently exchanged between species (24), and to evolve
quickly after invading into the new host (17). To elucidate the selective pressure under
the diversification of RM genes, the likelihood ratio test was employed to identify
whether those common ancestral genes were under positive selection after they were
transferred into the new hosts. Unlike the ‘arm race’ effect occurring on other
defense-related genes, no signature of adaptive evolution was found in the RM systems
except some groups of MTases. Although we cannot rule out that some groups of genes in
17
these families are under positive selection, it is in fact that many genes, especially for
Type I RM systems, are pseudogenes in various states of decomposition (26, 31).
Pairwise dN/dS analysis of Type I RM genes from closely related species also showed
strong signatures of relaxed selective constraints rather than positive selection (Data not
shown). These findings strongly suggest that the evolution of RM genes does not follow
the same pattern as most of the defense-related genes.
An exception is the MTase gene family, which has a total of 10.2% sites with
elevated rates of nonsynonymous substitution. The majority of these residues could be
precisely located on the MTase tertiary structure, which provides a structural
interpretation of adaptive evolution. Two positively selected residues are located in the
DNA binding pocket, whereas the other five residues are located between motif III and
IV (126S, 131I), or between VI and VII (206Q, 213E, 217N) when they were mapped
onto the secondary structure of M.HaeIII (3, 39). Extensive studies revealed that the
variable region between motif VIII and IX determines specificity (6, 20, 33, 39).
However, Timar et al. (49) found that two amino acid replacements between motif VI and
VII resulted in relaxed recognition specificity of SinI DNA-methyltransferase, suggesting
some specificity-determining elements should be located in the large domain. LRT
analysis also suggested these regions were subjected to strong adaptive evolution, and
some residues replacements in these motifs might provide structural variations and
thereby generated novel characteristics for the recognition of new target sites. It still
remains unclear how positive selection really operated on such specificity-determining
domains. Further studies focusing on the intraspecific homologous genes would be useful
to address the role of selective pressure in such families.
ACKNOWLEDGMENTS
This work was supported by grants from Key Innovative Project (KZCX3-SW-215) of
the Chinese Academy of Sciences.
REFERENCES
1. Afonnikov DA and Kolchanov NA. CRASP: a program for analysis of coordinated
substitutions in multiple alignments of protein sequences. Nucleic Acids Res 32:
W64-W68, 2004.
2. Atchley WR, Wollenberg KR, Fitch WM, Terhalle W, and Dress AW. Correlations
18
among amino acid sites in bHLH protein domains: an information theoretic analysis. Mol
Biol Evol 17: 164-178, 2000.
3. Bujnicki JM. Homology modelling of the DNA 5mC methyltransferase M.BssHII. Is
permutation of functional subdomains common to all subfamilies of DNA
methyltransferases? Int J Biol Macromol 27: 195-204, 2000.
4. Bujnicki JM. Understanding the evolution of restriction-modification systems: Clues
from sequence and structure comparisons. Acta Biochim Pol 48: 935-967, 2001.
5. Bujnicki JM. Sequence permutations in the molecular evolution of DNA
methyltransferases. BMC Evol Biol 2: 3, 2002.
6. Cheng X, Kumar S, Posfai J, Pflugrath JW, and Roberts RJ. Crystal structure of
the HhaI methyltransferase complexed with S-adenosylmethionine. Cell 74: 299-307,
1993.
7. Crooks GE, Hon G, Chandonia JM, and Brenner SE. WebLogo: A sequence logo
generator [online]. http://weblogo.berkeley.edu [2004].
8. Davies GP, Powell LM, Webb JL, Cooper LP, and Murray NE. EcoKI with an
amino acid substitution in any one of seven DEAD-box motifs has impaired ATPase and
endonuclease activities. Nucleic Acids Res 26: 4828-4836, 1998.
9. Dufresne A, Garczarek L, and Partensky F. Accelerated evolution associated with
genome reduction in a free-living prokaryote. Genome Biol 6: R14, 2005.
10. Enomoto M, Oosawa K, and Momota H. Mapping of the pin locus coding for a
site-specific recombinase that causes flagellar-phase variation in Escherichia coli K-12. J
Bacteriol 156: 663-668, 1983.
11. Feder M and Bujnicki JM. Identification of a new family of putative PD-(D/E)XK
nucleases with unusual phylogenomic distribution and a new type of the active site. BMC
Genomics 6: 21, 2005.
12. Guex N, Diemand A, and Peitsch MC. Protein modelling for all. Trends Biochem
Sci 24: 364-367, 1999.
13. Hall MC and Matson SW. Helicase motifs: the engine that powers DNA
unwinding. Mol Microbiol 34: 867-877, 1999.
14. Henikoff S and Henikoff JG. Position-based sequence weights. J Mol Biol 243:
574-578, 1994.
19
15. Ibanez M, Alvarez I, Rodriguez-Pena JM, and Rotger R. A ColE1-type plasmid
from Salmonella enteritidis encodes a DNA cytosine methyltransferase. Gene 196:
145-158, 1997.
16. Jeltsch A. Circular permutations in the molecular evolution of DNA
methyltransferases. J Mol Evol 49: 161-164, 1999.
17. Jeltsch A and Pingoud A. Horizontal gene transfer contributes to the wide
distribution and evolution of type II restriction-modification systems. J Mol Evol 42:
91-96, 1996.
18. Kawata Y, Yano S, Kojima H, and Toyomizu M. Transformation of Spirulina
platensis strain C1 (Arthrospira sp. PCC9438) with Tn5 transposase-transposon
DNA-cation liposome complex. Mar Biotechnol 6: 355-363, 2004.
19. Kinch LN, Ginalski K, Rychlewski L, and Grishin NV. Identification of novel
restriction endonuclease-like fold families among hypothetical proteins. Nucleic Acids
Res 33: 3598-3605, 2005.
20. Klimasauskas S, Nelson JL, and Roberts RJ. The sequence specificity domain of
cytosine-C5 methylases. Nucleic Acids Res 19: 6183-6190, 1991.
21. Kneale GG. A symmetrical for the domain structure of type I DNA
methyltransferases. J Mol Biol 243: 1-5, 1994.
22. Kobayashi I. Behavior of restriction-modification systems as selfish mobile elements
and their impact on genome evolution. Nucleic Acids Res 29: 3742-3756, 2001.
23. Kobayashi I. Restriction-modification systems as minimal forms of life. In:
Restriction endonucleases, edited by Pingoud A. Berlin: Springer-Verlag, 2004, p.19-62.
24. Kobayashi I, Nobusato A, Kobayashi-Takahashi N, and Uchiyama I. Shaping the
genome restriction-modification systems as mobile genetic elements. Curr Opin Genet
Dev 9: 649-656, 1999.
25. Konstantinidis KT and Tiedje JM. Trends between gene content and genome size
in prokaryotic species with larger genomes. Proc Natl Acad Sci USA 101: 3160-3165,
2004.
26. Kong H, Lin LF, Porter N, Stickel S, Byrd D, Posfai J, and Roberts RJ.
Functional analysis of putative restriction-modification in the Helicobacter pylori J99
genome. Nucleic Acids Res 28: 3216-3223, 2000.
20
27. Korolev S, Hsieh J, Gauss GH, Lohman TM, and Waksman G. Major domain
swiveling revealed by the crystal structures of complexes of E.coli Rep helicase bound to
single-stranded DNA and ADP. Cell 90: 635-647, 1997.
28. Kumar S, Tamura K, and Nei M. MEGA3: Integrated software for molecular
evolutionary genetics analysis and sequence alignment. Brief Bioinform 5: 150-163,
2004.
29. Lyra C, Halme T, Torsti A-M, Tenkanen T, and Sivonen K. Site-specific
restriction endonuleases in cyanobacteria. J Appl Microbiol 89: 979-991, 2000.
30. MacWilliams MP and Bickle TA. Generation of new DNA binding specificity by
truncation of the type IC EcoDXXI hsdS gene. EMBO J 15: 4775-4783, 1996.
31. Matveyev AV, Young KT, Meng A, and Elhai J. DNA methyltransferases of the
cyanobacterium Anabaena PCC7120. Nucleic Acids Res 29: 1491-1506, 2001.
32. Meeks JC, Elhai J, Thiel T, Potts M, Larimer F, Lamerdin J, Predki P, and
Atlas R. An overview of the genome of Nostoc punctiforme, a multicellular symbiotic
cyanobacterium. Photosynth Res 70: 85-106, 2001.
33. Mi S and Roberts RJ. How M.MspI and M.HpaII decide which base to methylate.
Nucleic Acids Res 20: 4811-4816, 1992.
34. Murray NE. Type I restriction systems: sophisticated molecular machines (a legacy
of Bertani and Weigle). Microbiol Mol Biol Rev 64: 412-434, 2000.
35. Nicholls A, Sharp KA, and Honig B. Protein folding and association: insights from
the interfacial and thermodynamic properties of hydrocarbons. Proteins: Stuct Func and
Genet 11: 281-296, 1991.
36. Nobusato A, Uchiyama I, and Kobayashi I. Diversity of restriction-modification
gene homologues in Helicobacter pylori. Gene 259: 89-98, 2000.
37. Piechula S, Waleron K, Swiatek W, Biedrzycka I, and Podhajska AJ. Mesophilic
cyanobacteria producing thermophilic restriction endonucleases. FEMS Microbiol Lett
198: 135-140, 2001.
38. Pollock DD and Taylor WR. Effectiveness of correlation analysis in identifying
protein residues undergoing correlated evolution. Protein Eng 10: 647-657, 1997.
39. Reinisch KM, Chen L, Verdine GL, and Lipscomb WN. The crystal structure of
HaeIII methyltransferase convalently complexed to DNA: an extrahelical cytosine and
21
rearranged base pairing. Cell 82: 143-153, 1995.
40. Roberts RJ, Belfort M, Bestor T, Bhagwat AS, Bickle TA, Bitinaite J, Blumenthal
RM, Degtyarev SK, Dryden DTF, Dybvig K, Firman K, Gromova ES, Gumport RI,
Halford SE, Hattman S, Heitman J, Hornby DP, Janulaitis A, Jeltsch A, Josephsen J,
Kiss A, Klaenhammer TR, Kobayashi I, Kong H, Kruger DH, Lacks S, Marinus MG,
Miyahara M, Morgan RD, Murray NE, Nagaraja V, Piekarowicz A, Pingoud A,
Raleigh E, Rao DN, Reich N, Repin VE, Selker EU, Shaw P, Stein DC, Stoddard BL,
Szybalski W, Trautner TA, Van Etten JL, Vitor JMB, Wilson GG, and Xu S. A
nomenclature for restriction enzymes, DNA methyltransferases, homing endonucleases
and their genes. Nucleic Acids Res 31: 1805-1812, 2003.
41. Roberts RJ, Vincze T, Posfai J, and Macelis D. REBASE - Restriction enzymes and
DNA methyltransferases. Nucleic Acids Res 33: D230-D232, 2005.
42. Rocha EPC, Danchin A, and Viari A. Evolutionary role of restriction/modification
systems as revealed by comparative genome analysis. Genome Res 11: 946-958, 2001.
43. Ruan H, Lunnen KD, Scott ME, Moran LS, Slatko BE, Pelletier JJ, Hess EJ,
Benner J, Wilson GG, and Xu S-Y. Cloning and sequence comparison of AvaI and
BsoBI restriction-modification systems. Mol Gen Genet 252: 695-699, 1996.
44. Schouler C, Gautier M, Duusko Ehrlich S, and Chopin MC. Combinational
variation of restriction modification specificities in Lactococcus lactis. Mol Microbiol 28:
169-178, 1998.
45. Schultz J, Milpetz F, Bork P, and Ponting CP. SMART, a simple modular
architecture research tool: Identification of signaling domains. Proc Natl Acad Sci USA
95: 5857-5864, 1998.
46. Sonnhammer ELL, Eddy SR, Birney E, Bateman A, and Durbin R. Pfam:
Multiple sequence alignments and HMM-profiles of protein domains. Nucleic Acids Res
26: 320-322, 1998.
47. Takahashi N, Naito Y, Handa N, and Kobayashi I. A DNA methyltransferase can
protect the genome from postdisturbance attack by a restriction-modification gene
complex. J Bacteriol 184: 6100-6108, 2002.
48. Thompson JD, Higgins DG, and Gibson TJ. CLUSTAL W: Improving the
sensitivity of progressive multiple sequence alignment through sequence weighting,
22
position-specific gap penalties and weight matrix choice. Nucleic Acids Res
22:4673-4680, 1994.
49. Timar E, Groma G, Kiss A, and Venetianer P. Changing the recognition specificity
of a DNA-methyltransferase by in vitro evolution. Nucleic Acids Res 32: 3898-3903,
2004.
50. Tomii K and Kanehisa M. Analysis of amino acid indices and mutation matrices for
sequence comparison and structure prediction of proteins. Protein Eng 9:27-36, 1996.
51. Tominaga A. The site-specific recombinase encoded by pinD in Shigella dysenteriae
is due to the presence of a defective Mu prophage. Microbiol 143: 2057-2063, 1997.
52. Toyomizu M, Suzuki K, Kawata Y, Kojima H, and Akiba Y. Effective
transformation of the cyanobacterium Spirulina platensis using electroporation. J Appl
Phycol 13: 209-214, 2001.
53. Tragut V, Xiao J, Bylina EJ, and Borthakur D. Characterization of DNA
restriction-modified systems in Spirulina platensis strain pacifica. J Appl Phycol 7:
561-564, 1995.
54. Vingron M and Argos P. A fast and sensitive multiple sequence alignment algorithm.
CABIOS 5: 115-121, 1989.
55. Wilson GG and Murray NE. Restriction and modification systems. Annu Rev Genet
25: 585-627, 1991.
56. Wong WSW, Yang ZH, Goldman N, and Nielsen R. Accuracy and power of
statistical methods for detecting adaptive evolution in protein coding sequences and for
identifying positively selected sites. Genetics 168: 1041-1051, 2004.
57. Yang ZH. PAML: a program package for phylogenetic analysis by maximum
likelihood. CABIOS 13: 555-556, 1997.
58. Yang Z and Bielawski JP. Statistical methods for detecting molecular adaptation.
Trends Ecol Evol 15: 496-503, 2000.
59. Yang Z, Nielsen R, Goldman N, Pedersen AM. Codon substitution models for
heterogeneous selection pressure at amino acid sites. Genetics 15: 1600-1611, 2000.
23
Figure 1. Organization of the gene clusters potentially encoding RM system of Spirulina platensis.
Arrows represent the direction of translation and the relative sizes of ORFs deduced from analysis of
the nucleotide sequence. Gene names are given on top of the corresponding region. White arrows
indicate the flanking ORFs; other types of arrows indicate different subfamilies of RM system. “×” in
the arrow indicates the nonsense codon. Gene clusters encoding Type I RM genes are marked with
Roman numerals (I-VI); the putative Type II RM genes or solitary MTases are marked with Arabic
numerals (1-11).
24
Figure 2. RT-PCR analysis of the expression of RM genes in Spirulina platensis. See Materials
and Methods for details.
25
Figure 3. Genomic organization of RM genes in unicellular (D: Synechocystis sp. 6803, E: Synechococcus elongtus) and filamentous
cyanobacteria (A: Spirulina platensis, B: Anabaena variabilis ATCC 29413, C: Anabaena sp. 7120). Vertical bars show the locations of RM genes,
with different colors representing different Pfam domains: red- Methylase_S, light green- HSDR_N, purple- DEXDc, blue- Methylase_M, black-
N6_Mtase, orange- MethyltransfD12, dark green- REase (not the Pfam nomenclature). The long horizontal line indicates the chromosome, whereas the
short horizontal lines denote the extranuclear plamids. A vertical bar above a horizontal line indicates the transcriptional direction opposite to that below
a horizontal line. The positions and orders of RM genes in B-E were drawn based on their genome assemblies. Note that Spirulina genome (A) was still
in draft format, and RM genes were mapped onto the chromosome evenly. Connecting lines represent similar domains (Blastp search, score>90 and
E-value<e-9) between genomes. Sequence similarities of among Type I RM genes from C/D/E were implied from the dendrogram constructed through
the NJ method, shown in black. The value in this dendrogram represents the bootstrap value.
26
Figure 4. The concordance of HsdM (left) and HsdR (right) protein phylogeny constructed with
NJ method. Dashed lines link the cognate hsdM and hsdR genes in the same gene cluster. Colored
branches and symbols indicate different sources of RM genes (red: α-proteobacteria, light green:
β-proteobacteria, brown: γ-proteobacteria, purple: δ-proteobacteria, light blue: ε-proteobacteria, green:
firmicutes, grey blue: spirochaetes, grey: chlorobi, blue: cyanobacteria, dark: archaea).
27
Figure 5. Unrooted NJ tree for 16s rDNA (A) and MTase (B) in prokaryotic organisms. Colored
branches and symbols indicate different sources of RM genes (red: α-proteobacteria, light green:
β-proteobacteria, brown: γ-proteobacteria, purple: δ-proteobacteria, light blue: ε-proteobacteria, green:
firmicutes, grey blue: spirochaetes, grey: chlorobi, blue: cyanobacteria, dark: archaea).
28
Figure 6. Pairwise matrix representation of the coevolving site pairs in hsdM and hsdR. Two
amino acids characteristics, size (below the diagonal) and hydrophilicity (above the diagonal), were
used for coevolution analysis. Red color in the matrix indicates the positively coevolving site pairs;
blue color is for the negatively coevolving site pairs. The critical value for the correlation coefficients
is 0.5253 at the 99.9% level. Logos representation of an alignment of the hsdM-motif I and IV, and
hsdR-motif I, Ia, II, III from the sequences listed in Fig.4. Several mostly conserved sites, which were
excluded from coevolution analysis, were added below the logo alignment. The invariable residues
were in capital letter, and variable sites were in lowercase, with the short and black line indicating
their locations. Blue shading shows the coevolving site pairs between hsdM and hsdR, which were
fully discussed in the text.
29
Figure 7. Mapping of the positively selected sites (in blue color) in MTase onto their tertiary
structures. (A) Ribbon diagram of AV-Mtase2 derived from homology modeling. (B) A
representation of the molecular surface of the MTase dimer. DNA molecules are in orange color.
Swiss-PDBviewer (12) and Grasp2 (35) were used for all structural manipulations.