Chapter 4 Molecular adaptation of the chloroplast matK gene...

15
118 Chapter 4 Molecular adaptation of the chloroplast matK gene in Nymphaea tetragona, a critically rare and endangered plant of India 4.1 Introduction Nymphaea tetragona represents one of the naturally occurring species found in India (Mitra, 1990). Its distribution is restricted to one particular location found at Nongkrem, East Khasi Hills District, Meghalaya, India (25°28´N-91°52´E). It is considered to be rare and endangered (Tandon, 2009; Tandon et al., 2010). Because of its present status, recovery programmes have been initiated for this plant taxon (Ganeshaiah, 2005; Tandon et al., 2010). Besides N. tetragona, there are an additional nine species (both wild and cultivated) of the genus Nymphaea occurring in India (Mitra, 1990). However, traditional classification of the genus in India has received unconvincing treatments with some names inaccurately used (Cook, 1996). To vindicate Cook‟s remarks, molecular taxonomic revision of four Indian representatives of the genus Nymphaea viz. N. nouchali, N. pubescens, N. rubra and N. tetragona based on ITS, trnK intron and matK gene was conducted (discussed in chapter 5, Dkhar et al., 2010). Molecular evidence suggested probable misidentification of one specimen of N. nouchali and N. tetragona. Furthermore, a comparison of the genetic closeness among the Indian, Chinese and Russian material of N. tetragona based on the ITS region was also evaluated. The results indicated a close relationship between the Indian and the

Transcript of Chapter 4 Molecular adaptation of the chloroplast matK gene...

  • 118

    Chapter 4

    Molecular adaptation of the chloroplast matK gene in

    Nymphaea tetragona, a critically rare and endangered plant

    of India

    4.1 Introduction

    Nymphaea tetragona represents one of the naturally occurring species found

    in India (Mitra, 1990). Its distribution is restricted to one particular location found at

    Nongkrem, East Khasi Hills District, Meghalaya, India (25°28´N-91°52´E). It is

    considered to be rare and endangered (Tandon, 2009; Tandon et al., 2010). Because

    of its present status, recovery programmes have been initiated for this plant taxon

    (Ganeshaiah, 2005; Tandon et al., 2010).

    Besides N. tetragona, there are an additional nine species (both wild and

    cultivated) of the genus Nymphaea occurring in India (Mitra, 1990). However,

    traditional classification of the genus in India has received unconvincing treatments

    with some names inaccurately used (Cook, 1996). To vindicate Cook‟s remarks,

    molecular taxonomic revision of four Indian representatives of the genus Nymphaea

    viz. N. nouchali, N. pubescens, N. rubra and N. tetragona based on ITS, trnK intron

    and matK gene was conducted (discussed in chapter 5, Dkhar et al., 2010). Molecular

    evidence suggested probable misidentification of one specimen of N. nouchali and N.

    tetragona. Furthermore, a comparison of the genetic closeness among the Indian,

    Chinese and Russian material of N. tetragona based on the ITS region was also

    evaluated. The results indicated a close relationship between the Indian and the

  • 119

    Chinese representatives which is further supported by the morphological similarities

    shared between them (eFloras, 2008).

    An interesting observation made from chapter 2 was the relatively higher

    number of nonsynonymous substitutions as compared to synonymous substitutions

    detected in the matK gene of N. tetragona. The higher rate of nonsynonymous

    substitutions is usually inferred as an indication of selective pressures acting on

    protein-coding sequences. The chloroplast matK gene, coding for the maturase

    enzyme, has been reported to evolve under positive selection in some lineages of

    land plants (Hao et al., 2010). According to Hao et al. (2010), several regions (N-

    terminal region, RT domain in the middle, and domain X at the C-terminus) of matK

    have experienced molecular adaptation, which fine-tunes maturase performance.

    In this chapter, to gain insight into whether an adaptive evolutionary process

    is linked to the colonization of new habitats by N. tetragona, selective pressure in

    matK using phylogenetic and maximum likelihood analyses of codon and branch site

    substitution models was tested. This estimation was extended to members of the

    order Nymphaeales, not included in the study of Hao et al. (2010), representing a

    lineage at the base of the angiospermic tree.

    4.2 Materials and Methods

    4.4.1 Amplification of the matK gene in N. tetragona

    New primers (NytrnKJD525-F 5′ TCGGGTTGCAAAAATAAAGG 3′ and

    NytrnKJD2551-R 5′ AAACCGTGCTTGCATTTTTC 3′) were designed to amplify

    the matK gene in six additional samples of N. tetragona, besides NytN1 (DNA

    sample number corresponding to first individual/sample). The same primers

  • 120

    mentioned above including NymatKJD1853-F, stated in chapter 2, were used for

    sequencing.

    4.4.2 Taxon sampling and nucleotide sequence data

    Nucleotide sequence data of the trnK intron, matK and rbcL gene of N.

    nouchali, N. pubescens, N. rubra and N. tetragona were used. In addition, nucleotide

    sequence data of Barclaya longifolia, Euryale ferox, Nuphar advena, Ondinea

    purpurea, Victoria cruziana, Amborella trichopoda and Austrobaileya scandens,

    identified as the most basal angiosperms, were retrieved from GenBank.

    4.4.3 Protein blast

    The amino acid sequence data of the chloroplast matK and rbcL gene of N.

    tetragona were subjected to protein blast (blastp) available at www.ncbi.nlm.nih.gov.

    4.4.4 Molecular evolutionary analysis

    In order to conduct phylogenetic maximum likelihood analysis of selection in

    chloroplast genes, a phylogenetic tree was reconstructed based on the chloroplast

    trnK intron, matK and rbcL gene using maximum likelihood method implemented in

    Phylip (Felsentein, 2004).

    Molecular adaptation tests on the chloroplast matK codon sites were

    performed using maximum likelihood models and programs included in PAML ver.

    4.3 (Yang, 2007). Six nonsynonymous sites detected in the matK gene of N.

    tetragona are indicated in Fig. 33. Five site-specific codon substitution models: null

    models for testing positive selection (M1A, M7 and M8A) and models allowing for

    http://www.ncbi.nlm.nih.gov/

  • 121

    Fig. 33. Nonsynonymous substitution detected in the matK gene of N. tetragona. The

    sequencing chromatogram of the matK gene of N. tetragona along with the

    corresponding nucleotide sequences of N. tetragona, other Nymphaea species, and

    related genera are indicated. Six nonsynonymous substitution sites were identified.

    The transformation of amino acid at these sites among the species included is

    indicated. A brief explanation of the significance of such an observation is indicated

    within the box.

  • 122

    positive selection (M2A and M8) were performed. Furthermore, to provide evidence

    whether positive selection is linked to the colonization of N. tetragona, the branch

    site models (Model A with ω fixed or estimated) implemented in PAML ver 4.3 were

    used. The likelihood ratio test (LRT) was used to compare these alternative models.

    4.3 Results

    4.3.1 Amplification and sequencing of the matK gene

    Amplification of the matK gene in six additional samples of N. tetragona

    yielded one single amplified product of approximately 1900bp in size (Fig. 34).

    Sequencing results revealed exact sequence matches among all individuals/samples

    of N. tetragona. Therefore, it could be assumed that a single haplotype of the matK

    gene is maintained throughout the population. Fig. 35 shows the single population at

    Umjapung Village, Nongkrem, East Khasi Hills District, Meghalaya, India where N.

    tetragona is found.

    4.3.2 Protein blast (blastp) of the Rubisco and maturase K proteins

    Protein blast of the Rubisco (rbcL gene encoded protein) amino acid

    sequence data of N. tetragona corresponds to the large subunit of ribulose-1,5-

    bisphosphate carboxylase/oxygenase of N. alba (Accession no. CAF28601). No

    partitioning of the protein is indicated. However, maturase K (matK gene encoded

    protein) blastp showed two regions identified as the N-terminal region, whose

    function is not known, and Domain X which is involved in splicing. The intervening

    region between the N-terminal and Domain X is called RT Domain (Fig. 36).

    http://www.ncbi.nlm.nih.gov/protein/50250335?report=genbank&log$=protalign&blast_rank=5&RID=5CV5UVDM012

  • 123

    Fig. 34. PCR amplified product of the matK gene in six additional individuals

    (NytN2–NytN7) of N. tetragona. The amplified product is approximately 1900bp in

    size. Ladder ranging from 500–5000bp in size was used as standard.

  • 124

    Fig. 35. Location at Umjapung Village, Nongkrem, East khasi Hills District. where

    N. tetragona is found. This is the only population of N. tetragona in India. Note the

    cultivation surrounding the pond.

  • 125

    Fig. 36. Protein blast (blastp) of the Rubisco and maturase K of N. tetragona. No

    partitioning of the Rubisco protein is indicated. For maturase K, blastp showed two

    regions identified as the N-terminal region and Domain X. The intervening region

    between the N-terminal and Domain X is called RT Domain.

  • 126

    4.3.3 Phylogenetic tree reconstruction

    Maximum likelihood analysis of the combined datasets yielded a

    phylogenetic tree with log likelihood value of –10921.95 (Fig. 37).

    4.3.4 Positive selection in matK gene

    For detecting positive selection using codon substitution models, three

    likelihood ratio tests were performed comparing M1A (nearly neutral) with M2A

    (positive selection), M7 (beta) with M8 (beta and ωs ≥ 1), and M8A (beta and ωs =1)

    with M8 (beta and ωs ≥ 1). Because positive selection operates more strongly at some

    sites (Aguileta et al., 2009) the distribution of sites influenced by selection for all

    three regions (N-terminal region, RT domain, and domain X) of the matK gene was

    evaluated (Table 19). For N-terminal region, model M2A indicated that 86.8% of the

    sites are under purifying selection (ω = 0.248), and 13.2% of the sites are under

    positive selection (ω = 1.778). In model M8, 13.0% of the sites are under selective

    pressure (ω = 1.791). For RT domain, model M8 estimated an ω value of 46.57 for

    0.01% of codon sites. Codon substitution models did not identify any sites under

    positive selective for domain X. Two LRTs, M7-M8 and M8A-M8, were significant

    at the 0.01 and 0.05 significance level respectively. Comparing M8 against the null

    model M8A is more robust than M1A vs. M2A or M7 vs. M8 (Swanson et al., 2003).

    Branch-site model A (ω2 =1 fixed) with branch-site model A (ω2 estimated)

    was compared. Each branch in the phylogenetic tree was specified a priori as

    foreground branches. For N-terminal region, model A (ω2 estimated) for N. tetragona

    branch indicated that 2.09% of sites are under selective pressure with ω2 = 23.099

    (Table 20). Bayes Empirical Bayes (BEB) analysis showed three positive sites with

  • 127

    Fig. 37. Maximum likelihood tree of combined chloroplast DNA dataset (3859 bp) using Phylip. Three positive sites with probability

    for foreground (N. tetragona) estimated using branch-site models of PAML are indicated. Amino acid replacements at these sites are

    also indicated at the branch (G: Glycine, E: Glutamic acid, Q: Glutamine, H: Histidine, L: Leucine, F: Phenylalanine). Numbers at nodes

    indicate bootstrap values. Prob, probability.

  • 128

    Table 19. Estimation of log-likelihood (LnL) values and parameters for each region of the matK gene using site-specific codon

    substitution models implemented in PAML.

    1 p, proportion of codon sites with ω; ω, dN/dS. For models M7, M8 and M8A, ω was estimated from a beta distribution B(p, q).

    Model N-terminal Region RT Domain Domain X

    LnL Parameters1

    LnL Parameters LnL Parameters

    Site-specific models

    M0: one ω –3163.49 ω = 0.38797 –229.30 ω = 0.32206 –1252.89 ω = 0.32610

    M1A: nearly neutral –3148.00 p0 = 0.74229, ω0 = 0.18594

    –229.30 p0 = 0.99999, ω0 = 0.32206

    –1250.61 p0 = 0.67267, ω0 = 0.11716

    p1 = 0.25771, ω1 = 1.00000 p1 = 0.00001, ω1 = 1.00000 p1 = 0.32733, ω1 = 1.00000

    M2A: positive selection –3146.07 p0 = 0.86780, ω0 = 0.24814

    –229.30 p0 = 1.00000, ω0 = 0.32206

    –1250.61 p0 = 0.67267, ω0 = 0.11716

    p1 = 0.00000, ω1 = 1.00000 p1 = 0.00000, ω1 = 1.00000 p1 = 0.31251, ω1 = 1.00000

    p2 = 0.13220, ω2 = 1.77808 p2 = 0.00000, ω2 = 1.00000 p2 = 0.01481, ω2 = 1.00000

    M7: beta –3150.77 p = 0.54661, q = 0.79966 –229.31 p = 47.39745, q = 99.00000 –1249.89 p = 0.38014, q = 0.66141

    M8: beta & ωs ≥ 1 –3146.10 p0 = 0.87015

    –229.31 p0 = 0.99999

    –1249.89

    p0 = 0.99999 p = 33.15470, q = 99.00000 p = 47.39742, q = 99.00000 p = 0.38013, q = 0.66140

    p1 = 0.12985, ω1 = 1.79169 p1 = 0.00001, ω1 = 46.75087 p1 = 0.00001, ω1 = 1.00000

    M8A: beta & ωs =1 –3148.07 p0 = 0.74402

    –229.31 p0 = 0.99999

    –1249.89 p0 = 0.99999

    p = 22.99702, q = 99.00000 p = 47.39723, q = 99.00000 p = 0.38014, q = 0.66142

    p1 = 0.25598, ω1 = 1.00000 p1 = 0.00001, ω1 = 1.00000 p1 = 0.00001, ω1 = 1.00000

  • 129

    Table 20. Estimation of log-likelihood (LnL) values and parameters for each region of the matK gene using branch-site models

    implemented in PAML selecting N. tetragona as foreground branch.

    1 p, proportion of codon sites with ω; ω, dN/dS.

    Model N-terminal Region RT Domain Domain X

    LnL Parameters1 LnL Parameters LnL Parameters

    Foreground: Nymphaea tetragona

    Model A

    (ω2 = 1 fixed) –3148.0

    p0 = 0.73633, ω0 = 18596 –229.3

    p0 = 1.00000, ω0 = 0.32206 –1250.6

    p0 = 0.67267, ω0 = 0.11716 p1 = 0.25529, ω1 = 1.00000 p1 = 0.00000, ω1 = 1.00000 p1 = 0.32733, ω1 = 1.00000

    p2+p3 = 0.00839, ω2 = 1.00000 p2+p3 = 0.00000, ω2 = 1.00000 p2+p3 = 0.00000, ω2 = 1.00000

    Model A (ω2 = estimated)

    –3147.8 p0 = 0.73321, ω0 = 0.18697

    –229.3 p0 = 1.00000, ω0 = 0.32206

    –1250.6 p0 = 0.67267, ω0 = 0.11716

    p1 = 0.24589, ω1 = 1.00000 p1 = 0.00000, ω1 = 1.00000 p1 = 0.32733, ω1 = 1.00000

    p2+p3 = 0.0209, ω2 = 23.09944 p2+p3 = 0.00000, ω2 = 1.17397 p2+p3 = 0.00000, ω2 = 1.00000

  • 130

    probability value of 0.607, 0.527, and 0.582 respectively. Branch-site models for the

    remaining regions of matK gene had similar log-likelihood values.

    4.3.5 Positive selection in rbcL gene

    For rbcL gene, model M2A indicated that 94.6% of the sites are under

    purifying selection (ω = 0.017), and 5.3% of the sites are under positive selection (ω

    = 1.400). In model M8, 5.3% of the sites are under selective pressure (ω = 1.401).

    The results indicated a relatively lower selective pressure on the rbcL gene as

    compared to matK. Identical log-likelihood values were observed while selecting N.

    tetragona as foreground lineage under branch-site models.

    4.4 Discussion

    4.4.1 Selective pressure during colonization of N. tetragona

    N. tetragona is well distributed globally, found throughout China (Ma et al.,

    2008; Hu et al., 2009), Finland (Nurminen, 2003), Japan (Takamura et al., 2003),

    and Russia (Volkova et al., 2010); but its location in India is restricted to one

    particular location. It is interesting to note that plants of N. tetragona found in India

    are closely related to plants available in China (discussed in chapter 5, Dkhar et al.,

    2010). Therefore, evidences from molecular evolutionary analysis suggested that the

    plant taxon migrated from China, and the migratory processes involved might have

    brought about some changes at both the DNA and protein sequence level. This

    adaptive response would have rendered slight advantage for the plant to survive in

    new habitat. Such adaptive changes at the DNA and protein sequence levels of rbcL

  • 131

    gene have been reported in Schidea (Kapralov and Filatov, 2006) and Potamogeton

    (Iida et al., 2009).

    Out of six nonsynonymous substitution detected in the matK gene of N

    tetragona, three showed selection pressure with probability value of 0.607, 0.527,

    and 0.582, respectively. The three selected sites are located at the N-terminal region

    of matK gene, whose function is not known. Molecular evolutionary analysis did not

    identify any sites under positive selective for domain X, a relatively more conserved

    and important functionally, as it determines the maturase activity of matK proteins

    (Hausner et al., 2006). Similarly, RT domain showed relatively lower percentage of

    sites (0.01%) with positive selection as compared to N-terminal region (13.2%) of

    matK gene. Such observation suggests that selective pressures are likely to occur in

    regions that are functionally unimportant. Hao et al., (2010) observed that more

    positively selected sites were detected in the N-terminal region than in the RT

    domain and domain X of matK gene.

    4.4.2 Implications for the conservation of N. tetragona

    Sustainable utilization of plant genetic resources for food and agriculture has

    been increasingly discussed at both national and international forums. Besides

    exploitation, conservation of plant genetic resources has become an integral part of

    these discussions. Conservation aims at maintaining the diversity of living

    organisms, their habitat and the interrelationship between organisms and their

    environment. For achieving such goals, appropriate conservation strategies have to

    be adopted. Determining the genetic makeup of a particular plant species is of critical

    importance when planning a suitable conservation strategy. The present chapter

  • 132

    provided evidence for molecular adaptation of matK gene in N. tetragona, a critically

    rare and endangered plant of India. Such adaptive changes at the molecular level

    might have been associated with the adaptation of N. tetragona to varying ecological

    conditions. So, threat to the existence of this plant taxon could primarily be attributed

    to being anthropogenic, brought about through large scale cultivation at the vicinity

    of the location (Fig. 35). Conservation of N. tetragona can be targeted at controlling

    such activities or the probable translocation to another site.