Imperial College London · Web viewto the transcriptome with STAR v.2.5.3a [31], allowing unlimited...

23
Sympatric speciation in Mountain Roses (Metrosideros) on an oceanic island Owen G. Osborne 1,2 , Tane Kafle 1,3 , Tom Brewer 1,4 , Mariya P. Dobreva 1 , Ian Hutton 5 , Vincent Savolainen 1 1 Department of Life Sciences, Silwood Park Campus, Imperial College London, Ascot, SL5 7PY, UK 2 Current address: Molecular Ecology and Fisheries Genetics Laboratory, Environment Centre Wales, School of Natural Sciences, Bangor University, Bangor, LL57 2UW, UK 3 Current address: Department of Ecology and Evolution, University of Lausanne, Biophore, 1015 Lausanne, Switzerland 4 Current address: Department of Zoology, University of Oxford, Zoology Research and Administration Building, 11a Mansfield Road, Oxford, OX1 3SZ, UK 5 Lord Howe Island Museum, Lord Howe Island, NSW 2898, Australia 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Transcript of Imperial College London · Web viewto the transcriptome with STAR v.2.5.3a [31], allowing unlimited...

Sympatric speciation in Mountain Roses (Metrosideros) on an oceanic island

Owen G. Osborne1,2, Tane Kafle1,3, Tom Brewer1,4, Mariya P. Dobreva1, Ian Hutton5, Vincent Savolainen1

1 Department of Life Sciences, Silwood Park Campus, Imperial College London, Ascot, SL5 7PY, UK

2 Current address: Molecular Ecology and Fisheries Genetics Laboratory, Environment Centre Wales, School of Natural Sciences, Bangor

University, Bangor, LL57 2UW, UK

3 Current address: Department of Ecology and Evolution, University of Lausanne, Biophore, 1015 Lausanne, Switzerland

4 Current address: Department of Zoology, University of Oxford, Zoology Research and Administration Building, 11a Mansfield Road, Oxford, OX1 3SZ, UK

5 Lord Howe Island Museum, Lord Howe Island, NSW 2898, Australia

Abstract

Shifts in flowering time have the potential to act as strong prezygotic reproductive barriers in plants. We investigate the role of flowering time divergence in two species of mountain rose (Metrosideros) endemic to Lord Howe island, Australia, a minute and isolated island in the Tasman sea. Metrosideros nervulosa and M. sclerocarpa are sister species, have divergent ecological niches on the island but grow sympatrically for much of their range, and likely speciated in situ on the island. We used flowering time and population genomic analyses of population structure and selection, to investigate their evolution, with a particular focus on the role of flowering time in their speciation. Population structure analyses showed the species are highly differentiated and appear to be in the very late stages of speciation. We found flowering times of the species to be significantly displaced, with M. sclerocarpa flowering 53 days later than M. nervulosa. Furthermore, analyses of selection showed that flowering time genes are under selection between the species. Thus, prezygotic reproductive isolation is mediated by flowering time shifts in the species, and likely evolved under selection, to drive the completion of speciation within a small geographical area.

Keywords

Flowering time; phenology shifts; reproductive isolation; reproductive character displacement; reinforcement; ecological speciation.

Background

Many examples of incomplete speciation are known, in which there is some degree of reproductive isolation (RI) but gene flow still occurs [1]. The path between such a state of incomplete speciation and complete RI remains relatively poorly understood, however. Divergent local adaptation can lead to postzygotic RI between incipient species via immigrant inviability and reduced hybrid fitness [2,3]. It may be that divergent selection can lead to the completion of speciation solely through reduced hybrid fitness. However, in many cases it is likely that the further evolution of prezygotic RI may be necessary [1].

In flowering plants, differences in floral traits between species are the main source of prezygotic isolation. These can include differences in flowering phenology, pollinator specificity, or pollen-stigma interactions that make heterospecific pollen tubes less likely to reach the ovule [4,5]. All of these can arise by genetic drift in the absence of gene flow, such as during allopatric separation of the species. In cases where gene flow is ongoing, however, the evolution of prezygotic RI via floral trait differences is more likely to be the result of selection. Selection can indirectly increase prezygotic RI via pleiotropy - when there is ecologically-based divergent selection on a reproductive trait, or when ecologically-based divergent selection on a gene pleiotropically affects another reproductive trait (often called a ‘magic trait’ mechanism [6]). Alternatively, selection can directly increase prezygotic RI via reinforcement [7]. In this latter scenario, traits are selected that avoid the wasteful production of unfit hybrids by decreasing the chance of interspecific mating. While many well-known cases of reinforcement involve secondary contact, it has also been demonstrated in cases where gene flow has been ongoing throughout divergence [8].

Lord Howe Island (LHI) is well-known among evolutionary biologists for its endemic palm trees, Howea belmoreana and H. forsteriana, which represent one of the most convincing cases of sympatric speciation in nature [9]. LHI palms are reproductively isolated by both postzygotic (likely via selection against hybrids, [8]) and prezygotic RI (through flowering time differences [10]). It has previously been hypothesised that prezygotic RI in Howea may have evolved via a pleiotropic mechanism, because genes involved in both soil adaptation and flowering time regulation show evidence of adaptive divergence between the species [11,12].

Like Howea, the two endemic species of Metrosideros (Myrtaceae) on LHI, M. nervulosa C.Moore & F.Muell (1873) and M. sclerocarpa J.W.Dawson (1990), are likely to have speciated in situ on the island [13]. They are sister species with an estimated divergence time of ~3.53 million years ago and have identical chromosome numbers, ruling out polyploid speciation [13]. The species both have showy flowers; they are believed to be outcrossing, wind dispersed and insect and/or bird pollinated (Hutton, pers. obsv.), although no systematic data exists to confirm this. They have distinct morphologies (Fig. 1B and C) and ecological preferences, with M. nervulosa occupying drier, more exposed sites at higher elevations (approx. 57 – 875m) and M. sclerocarpa preferring wetter conditions close to streams at lower elevations (approx. 10 – 481m) [14]. At higher altitudes in the southern mountains of LHI, M. nervulosa grows in isolation, and in some lower altitude locations, particularly around creeks, only M. sclerocarpa is found [15]. For a substantial part of their range, however, the two species are sympatric. Despite their overlapping geographical distributions, hybridisation appears to be infrequent [15]. Previous work has noted that M. nervulosa flowers earlier in the season (October – January) than M. sclerocarpa (December – February), although this has not been quantified directly. Flowering time variation thus has the potential to act as a mechanism of prezygotic RI in the species. Furthermore, while previous work has found genetic evidence of divergent selection in the form of associations between AFLP markers and various ecological variables [15], no methods that allow the identification of genes likely to be under selection have so far been employed. Such information could provide vital clues to the selective drivers of their divergence and speciation.

In this study, we investigated the evolution of prezygotic RI in LHI Metrosideros with particular focus on flowering phenology. Firstly, we sequenced ten transcriptomes from each of M. nervulosa and M. sclerocarpa, the first high-throughput sequence data for the species. We used these data to investigate the level of differentiation and population structure within and between the species and identify genes putatively under divergent selection. Secondly, we looked for flowering time genes in their transcriptomes, by homology with those of Arabidopsis thaliana, and compared their selective landscape to the rest of the transcriptome. Finally, we compiled a flowering time dataset from a combination of occurrence data and herbarium specimens; from which collection date, reproductive state, and collection location could be unambiguously inferred. We then used these data to quantify flowering time divergence between the species in both allopatric and sympatric areas of the island. Together, the data provide the first evidence that flowering time divergence is under direct selection in these species.

Methods

Tissue sampling and transcriptome sequencing

Mature leaf tissue was collected for ten adult individuals each of Metrosideros nervulosa and M. sclerocarpa between 12th and 14th of April 2016 (Fig. 1,Table S1). Tissue was immediately dissected into 5 mm2 sections and stored in RNAlater (Sigma) at -20°C. Leaf tissue samples were sent to the BGI Tech Solutions (Hong Kong) for RNA extraction, Illumina library preparation, and paired-end sequencing. Paired-end 100 base pairs (bp) libraries were multiplexed and sequenced on an Illumina HiSeq 4000 sequencer. For each species, ten individuals were sequenced. One individual of each species was sequenced to a depth of > 40 million read-pairs whereas the rest were sequenced to a depth of > 10 million read pairs. Only the individuals sequenced to > 40 million read-pairs were used for transcriptome assembly and all individuals were used for variant calling. All new data are available from the Short Read Archive (SRA) under the accession number PRJNA588239. We also used publicly available RNA-seq data for Metrosideros polymorpha, a Hawaiian endemic, to root our phylogenetic tree (available from SRA under the accession number PRJDB4443).

Bioinformatic processing

Quality control and error correction steps were conducted as recommended by MacManes et al. [16]. The quality of the raw reads was assessed using FASTQC [17]. Sequencing errors within the raw reads were corrected using Rcorrector v1.0.2 [18] with default settings. Trimmomatic v0.36 [19] was used to remove Illumina adapter sequences and trim regions with Phred scores which averaged ≤ 5 across sliding windows of 4 base pairs [16].

To produce a high-quality reference transcriptome for LHI Metrosideros, we followed the multiple-algorithm, multiple-Kmer approach of Osborne et al. [12]. We used the individual which was sequenced to > 40 million read-pairs for each species to produce separate transcriptome assemblies using eight different assembly algorithms (BinPacker v.1.0 [20], Bridger v.2014-12-01 [21], IDBA-tran v.1.1.0 [22], Oases v.0.2.08 [23], Shannon v.0.0.2 [24], SOAPdenovo-trans v.1.0.4 [25], TransABYSS v.1.5.5 [26] and Trinity v.2.4.0 [27]) using either three (K = 19, 25, 33 for Bridger, BinPacker and Trinity) or six (K = 21, 31, 41, 51, 61, 71 for IDBA-tran, Oases, Shannon, SOAPdenovo-Trans and TransABYSS) different Kmer lengths. This resulted in 68 initial assemblies (34 for each species: only one was produced by IDBA-tran, because this algorithm combines all Kmer-length assemblies into a single assembly, but one was produced for each Kmer lengths for the other algorithms). TransDecoder v.3.0.1 was then used to identify coding sequences (CDS) for each of these assemblies [28], and contigs without a CDS of 100bp in length were discarded. Retained CDS for all assemblies, for each species separately, were then clustered using CD-HIT-EST v.4.6.1 [29] and the longest sequence for each cluster was retained to produce an assembly for each species. These species-specific assemblies were then further filtered to remove transcripts which were recovered by less than two assembly algorithms and in less than four individual assemblies (i.e. algorithm plus Kmer length). The two species-specific assemblies were then matched using a reciprocal-best-BLAST (RBB) approach and a combined assembly was produced following the method in [12].

To estimate transcript-locus relationships and further reduce redundancy, we used Corset [30]. We first mapped the reads from M. nervulosa and M. sclerocarpa to the transcriptome with STAR v.2.5.3a [31], allowing unlimited numbers of mappings per read. The resulting mappings were then input into Corset v.1.06. The approach uses reads that map to multiple contigs to group contigs into clusters that are likely to represent different splice-variants from the same locus [30]. To reduce the chance of gene regions present in multiple splice-variants being analysed multiple times, we retained only the contig with the longest CDS from each Corset cluster for further analysis.

For variant calling, all BAM-formatted read-mappings (as well as those from M. polymorpha, which were not used for Corset), were processed using Picard tools v.2.6.0 (available from Github http://broadinstitute.github.io/picard) to add read groups, sort reads by coordinate, and mark duplicates. Reads with multiple mappings were removed. Sequences overhanging into intronic regions were then clipped with GATK SplitNCigarReads to further reduce the likelihood of calling false variants. The samtools-bcftools v. 1.3.1 pipeline was then used for variant calling and filtering. The mpileup module of samtools, and the call and filter modules of bcftools (Li et al, 2009) were used to identify SNPs. Reads with a PHRED-scaled mapping quality of under 20 were removed and variants were retained if they had a PHRED-scaled genotype quality above 20, an overall depth of three with a minimum of two reads per allele and were at least 3bp from the nearest indel. VCFs were converted into FASTA formatted alignments using the vcf2fas script (Bruno Nevado, available from https://github.com/brunonevado/vcf2fas).

All transcripts were annotated by BLAST. The CDS for each gene was searched against the proteomes of Arabidopsis thaliana and Eucalyptus grandis (a close relative of Metrosideros with a high quality genome assembly) which were downloaded from Phytozome [32] (available from https://phytozome.jgi.doe.gov; last accessed July 23, 2019) using BLASTx v2.2.25 [33] with an E-value cutoff of 0.001. Transcripts were annotated with the Gene Ontology (GO) terms for their top Arabidopsis BLASTx hit, which were also downloaded from Phytozome [32]. We identified genes as potentially involved in flowering time (hereafter “flowering time genes”) as those whose closest Arabidopsis homologue was in the FLOR-ID flowering time gene database [34].

Phylogenetic and population genetic analysis

We took three approaches to analyse population structure and species differentiation. Firstly, we concatenated all transcript alignments (including sequences from the outgroup Metrosideros polymorpha) and used the data to produce an individual-level phylogeny using RAxML v. 8.1.17 using the GTR-GAMMA model of sequence evolution. The resulting phylogenetic tree was rooted with M. polymorpha. Secondly, we used a Bayesian approach to estimate both the most likely number of genetic clusters and the proportion of membership of each individual to each cluster. To reduce the effect of linkage, only the SNP with the least missing data per contig was included in the analysis. The fastSTRUCTURE software [35] was run on this dataset using the “simple priors” mode and values of K (number of clusters) between 1 and 20. The most likely number of clusters was then selected using the five-fold cross-validation procedure implemented in the chooseK.py script included with the software. Cluster membership for each individual was then calculated using the “logistic priors” mode. Thirdly, we implemented a multidimensional scaling approach on the same reduced SNP dataset using the mds-plot function in PLINK v. 1.9 [36].

To estimate the impact of selection across the genome, we calculated Weir and Cockerham’s FST [37] and net divergence (dXY) [38] between species, and diversity (π) [38] and Tajima’s D (Dtaj) [39] within each species, using the PopGenome package in R [40]. Each of these statistics were calculated per-gene. Within species diversity was compared globally using a Mann-Whitney U test. Negative values of Dtaj can indicate genetic bottlenecks (if genome-wide) or positive selection (if genomically-localised) [39]. Therefore, we considered genes with Dtaj in the bottom 5th percentile (below -1.148 and -1.401 for M. nervulosa and M. sclerocarpa, respectively) to be Dtaj outliers and those in the bottom 1st percentile (below -1.535 and -1.755 for M. nervulosa and M. sclerocarpa, respectively) to be strong Dtaj outliers. Since FST is a relative measure of differentiation, locally high FST can be the result of either high between-species divergence (i.e. dXY), low within-species divergence (i.e. π) or a combination of these. Genomic regions in which high FST is driven by high dXY can be interpreted as having a reduced effective migration rate between hybridising species, which may be the result of divergent selection (i.e. “speciation islands” [41]). Genomic regions in which high FST is driven by low π are likely to have undergone repeated selective sweeps [41]. Therefore, we used the composite outlier analysis approach implemented in the R package MINOTAUR [42]. We implemented three outlier scans in MINOTAUR, using only genes for which 50% of individuals from each species had less than 50% missing data, to guard against inaccurate estimates of summary statistics due to low numbers of individuals. To identify genes which were highly differentiated due to high divergence, we used MINOTAUR to find genes that were outliers for both high FST and dXY. To identify genes that were highly differentiated due to low diversity, we used MINOTAUR and searched for genes that were outliers for high FST and low π for each of the species separately. All four statistics were first converted to fractional rank-based P-values (with appropriate one-tailed tests: right-tailed for FST and dXY and left-tailed tests for π) and outlier status was quantified using Mahalanobis distance, the approach that has been shown to be most effective for summary-statistic data [43]. Genes within the top 95th percentile of Mahalanobis distance for each scan were considered to be outliers and genes within the top 99th percentile for Mahalanobis distance were considered to be strong outliers [42].

To gain insight into the functional significance of genes likely to be under selection, outliers from each of the MINOTAUR tests, as well as Dtaj outliers for each species, were then tested for GO term enrichment using the topGO v.2.26.0 package in R [44] using the weight algorithm, a node size of 10 and Fisher’s exact tests. We used our outlier results in combination with our functional annotation results to identify candidate genes which may be involved in flowering time differences between the species. We considered genes that were outliers in one of the MINOTAUR tests, and which were annotated as flowering time genes, as candidates.

Analysis of flowering time

We used herbarium and occurrence data to estimate flowering time following [45]. For herbarium specimens, we searched four herbarium databases, and two occurrence databases (Table S2) for records of M. nervulosa and M. sclerocarpa, as well as including our own previously unpublished occurrence observations. Only herbarium vouchers with flowers, and for which date of collection and geographic location was recorded, were retained. For occurrence data we only included records which included collection date and geographic location as well as a clear statement that the plant was in flower. To ensure there were no duplicate records, we removed any datapoints which had the same date, location and species, unless it was clearly stated in the records that they were different plants. Each record was classed as a single flowering time observation, the date of flowering was converted to number of days after July 1st (which is outside the flowering period of either species) and the distribution of dates between the species were compared using Mann-Whitney U tests. Additionally, flowering occurrences for each species were divided into those from sympatric and allopatric locations based on the known ranges of the two species, and these were again compared with Mann-Whitney U tests.

Results

Transcriptome assembly and annotation

Following trimming and read correction, there were between 12,758,120 and 48,793,656 read pairs per individual (Table S1). Internal insert sizes for paired reads ranged from 2 to 377 (mean = 154.9 52.5 [SD]). Using our multi-assembler, multi-Kmer assembly pipeline, data from the individual with most reads per species were assembled into 62,133 and 53,403 transcripts for M. nervulosa and M. sclerocarpa, respectively. Reciprocal-Best-BLAST matching of transcripts in the two assemblies resulted in 22,073 identifiable orthologs between the two species. These were further clustered into 13,344 putative genes using Corset, and the longest transcript from each putative gene formed our final transcriptome assembly. Variant calling resulted in the identification of 71,160 SNPs within CDS regions, 24,536 (34%) of which were fixed. In total, 1,468 (11%) contigs contained only fixed SNPs and 4,018 (30%) contained no SNPs. We successfully BLAST annotated 97.6% of genes to an Arabidopsis thaliana gene and 98.1% to a Eucalyptus grandis gene. In total, 55.3% of annotated Metrosideros genes were uniquely annotated to an A. thaliana gene (i.e. were the only gene which was annotated to a specific Arabidopsis gene), however for Eucalyptus, which is much more closely related to Metrosideros, this figure rose to 74.15%. Overall, this represented 34.5% and 30.9% of all Arabidopsis and Eucalyptus genes, respectively. We identified 296 potential flowering time genes by comparison to the FLOR-ID database. These were annotated to 163 unique Arabidopsis genes and 216 unique Eucalyptus genes.

LHI Metrosideros species are highly genetically differentiated

In the population structure analyses, individuals consistently showed total association to their respective species. The population was best described by two genetically distinct clusters (K = 2) with no detected admixture in the fastSTRUCTURE analysis (Fig. 2B). Our MDS approach showed the species were fully differentiated along the first MDS dimension (Fig. 2A). Metrosideros sclerocarpa individuals were far more tightly clustered than M. nervulosa, and distribution of M. nervulosa along the second MDS dimension was significantly correlated with sampling altitude (Pearson’s moment correlation r = −0.888; P < 1.0×10−03). Both species were monophyletic in the phylogeny (Fig. 2C) but M. sclerocarpa had significantly shorter branch lengths. This, along with its tighter clustering in the MDS, and lower genetic diversity relative to M. nervulosa (Mann-Whitney U tests: mean πnervulosa = 0.0009, mean πsclerocarpa = 0.0001, P = 2.2 x 10-16; Fig S1) may be evidence of a genetic bottleneck in M. sclerocarpa.

LHI Metrosideros are reproductively isolated by flowering time

Flowering time was significantly different between the species (Mann-Whitney U test; P = 1.07 x 10-13). Median flowering time of M. nervulosa was the 14th of November, 53 days earlier than that of M. sclerocarpa on the 6th of January (Fig. 3). If reinforcement had operated to shift flowering time between the species, the effect would be expected to be stronger in areas of sympatry. To examine this, we divided our flowering time data into areas where both species are known to grow sympatrically and those where they grow in isolation. In M. sclerocarpa, median flowering time was indeed 30 days later (i.e. more different than M. nervulosa flowering time) in sympatric areas than in allopatric areas, although the result was not statistically significant (Mann-Whitney U test; P = 0.24). Median flowering time for M. nervulosa in sympatric areas was six days later than in allopatric areas (i.e. marginally closer to M. sclerocarpa flowering time) but this difference was also non-significant (Mann-Whitney U test; P = 0.3).

Multiple flowering time and ecologically relevant genes are under selection

Overall, we identified 30 genes that are candidates for involvement in flowering time differences between the species. These were homologues of the Arabidopsis genes AGL20, PUB13, MSI1, ADC2, ELF3, BBX19, ZTL, LUX, SHL, FVE, AT3G10390, HUB1, AGL42, SUS4, HDA9, HAM1, CRY1, HXK1, UBC2, RR2, GCR1, RRP6L2, GA20OX2, AGL8, UPB13, LCL5, ARF-2, FLC, AT2G28550 and TEM1. Eight of these were outliers in the FST and dXY outlier scan (Fig. S2), and 29 were outliers in both of the FST and π outlier scans (Table S4). In all but one of these genes, different alleles were completely fixed for each species (i.e. FST = 1 and π = 0 for both species). Our GO enrichment analysis found between one and nineteen GO terms enriched in each set of outliers (FST and dXY outliers: 19 GO terms, FST and dXY strong outliers: 15, FST and π M. nervulosa outliers: 17, FST and π M. nervulosa strong outliers: 17, FST and π M. sclerocarpa outliers: 17, FST and π M. sclerocarpa strong outliers: 17, Dtaj M. nervulosa outliers: 6, Dtaj M. nervulosa strong outliers: 9, Dtaj M. sclerocarpa outliers: 1, Dtaj M. sclerocarpa strong outliers: 1). These included some GO terms with ecological relevance to Metrosideros, such as “cellular response to osmotic stress”, “response to light intensity” and “cellular response to abiotic stimulus”, and it is possible that these genes are involved in ecological adaptation to the species differing habitats on the island.

Discussion

Speciation may be complete in Metrosideros

Our results indicate that speciation between M. nervulosa and M. sclerocarpa is in a late stage, and may even be complete. Structure and MDS analyses showed that they were highly differentiated and a large proportion of genes (11%) were fixed for different alleles in the two species. Furthermore, previous work has not identified any hybrids, and indeed Papadopulos et al [15] found low levels of admixture, with only two of 150 individuals having over 10% admixture in an AFLP-based STRUCTURE analysis [46]. As with previous work [15], we found evidence that M. sclerocarpa has far lower genetic diversity M. nervulosa. This may be evidence that the species has undergone a bottleneck. In contrast to a previous AFLP-based phylogenetic analysis of the species [15], in which M. sclerocarpa formed a clade within M. nervulosa, our phylogenetic tree shows that both species are monophyletic (Fig. 2C). In addition to the fact that speciation may be complete or almost complete in these species, there is strong evidence that speciation happened in situ on LHI. As in Howea, and for some other species on LHI, the fact that the island is both very small (12km at its longest diameter) and very isolated (600km from the nearest other landmass) makes a period of allopatry implausible [9,12,13]. While the island, as well as the nearby sea stack Ball’s Pyramid, have been larger in the past due to lower sea levels, the divergence time of the species (3.53 million years) coincides with global mean sea levels around 20m higher than today [47]. Thus, given that LHI Metrosideros most likely speciated in sympatry, their near complete reproductive isolation makes the species rather unusual. The majority of examples of non-allopatric speciation involve recent or ongoing gene flow [48–50], and even in Howea, which has substantial postzygotic and prezygotic RI and is regarded as one of the most convincing examples of sympatric speciation, several adult hybrids have been identified [10]. One additional point of interest in the system, is that M. nervulosa also has a yellow flowered variety endemic to a small area on Mount Lidgebird, one of the two Southern mountains on LHI (IH; personal observation; example specimens can be found in herbaria e.g. http://specimens.kew.org/herbarium/ K000566481). While we have not sampled the yellow-flowered variety in this study, and indeed no researchers have investigated them in detail to our knowledge, it is possible that these represent a second case of speciation in LHI Metrosideros, particularly since the evolution of flower colour is known to lead to RI via pollinator shifts in many plant species [51–53], although this is not always the case [54]. Therefore, the species represent an excellent case study of very late (plus perhaps also the early) stages of speciation in geographic isolation.

The evolution of flowering time-based prezygotic reproductive isolation in Metrosideros

Our flowering time analysis confirmed earlier reports that the species had diverged in flowering phenology, and this is likely to significantly increase RI between the species. However, there are several possibilities for the role it may have played in speciation. Firstly, flowering time divergence could arise as a plastic response to the colonisation of a new environment (such as migration to a higher altitude), which would not require selection [55]. Secondly, selection could indirectly affect flowering time by a pleiotropic mechanism, whereby ecological selection acts on genes which also affect flowering time [6]. Thirdly, flowering time divergence could be directly selected as a reinforcement mechanism to reduce hybridisation before speciation is complete [52]. Fourthly, flowering time divergence could be directly selected after speciation is complete, for example as a mechanism to avoid wasting reproductive effort on inviable hybrid seeds. While some authors consider the latter as a form of reinforcement [52], others prefer to term it reproductive character displacement [7], since it does not contribute to speciation. None of these possibilities are mutually exclusive, and each could feasibly contribute to the evolution of flowering time divergence at different stages in the speciation process.

Reinforcement is often detectable by a stronger divergence of reproductive traits in sympatry than in allopatry [51]. While the difference between flowering times for sympatric and allopatric populations of M. nervulosa and M. sclerocarpa was non-significant, this may be a result of sample size. While we had reasonable sample sizes for each species (M. nervulosa = 74, M. sclerocarpa = 39), these were not evenly distributed between sympatric and allopatric areas, and in particular we only had nine observations from allopatric M. sclerocarpa. While our data collection method was necessarily limited by the availability of specimens and observations, it is possible that a detailed survey would uncover differences. Previous work has noted that flowering time is later at higher altitude in both species [15]. Since the area of sympatry is at intermediate altitudes (at the lower and higher altitudinal extremes of ranges for M. nervulosa and M. sclerocarpa, respectively) and M. nervulosa flowers earlier than M. sclerocarpa overall (Fig. 3), this would mean that the sympatric area would have the most different flowering times between the species (the latest M. sclerocarpa flowering times and the earliest M. nervulosa flowering times). A detailed flowering time study should certainly be a priority for future research in these species. This should ideally be accompanied by pollination experiments and population genetic tests of gene flow in order to determine whether speciation is complete and thus differentiate between reinforcement as a contributing factor to speciation and reproductive character displacement following speciation [7].

Evidence for selection on flowering time and ecologically relevant genes

While our results cannot conclusively measure the relative importance of different evolutionary processes in generating flowering time divergence in Metrosideros, they do provide evidence that it has been driven by selection. Both reinforcement and pleiotropic mechanisms involve selection on genes that affect flowering time, and we identified 30 flowering time genes with evidence of selection. Of these, ARF-2 and ADC2 are of particular interest as they are also involved in drought and salt stress, respectively, in Arabidopsis [56,57]. It is possible that these genes could provide a pleiotropic link between flowering time divergence and ecological variables such as water availability and distance to the coast, which are known to differ between the species’ habitats [14]. Unexpectedly, three outlier genes were annotated with the GO term “pollen germination”, a significant enrichment. This raises the possibility that prezygotic isolation by way of pollen-stigma interactions could be important in these species [4], in which heterospecific pollen nuclei are prevented from reaching the ovule. However, such hypotheses are speculative at present.

There were also genes with evidence of selection that could be involved in ecological adaptation in the species. Previous work has shown that the species are genetically isolated by altitude, proximity to the coast, proximity to creeks, light availability, soil pH and wind exposure, while controlling for geographic distance [14]. Many of these variables were reflected in the genes under selection in the species. For example, 4 of the 23 genes annotated with the GO term “cellular response to osmotic stress” showed evidence of selection and this represented a significant enrichment. Since M. sclerocarpa prefers to grow around creeks, whereas M. nervulosa is more common on exposed ridges, it is highly plausible that the species are differentially adapted with respect to responding to osmotic stress due to reduced water availability. Furthermore, the GO term “regulation of root development” was significantly enriched amongst all sets of outliers, which is also potentially related to water availability. Light also differs between the habitats of the two species, and this was reflected in the enrichment of genes annotated with the GO terms “response to light intensity”, “chlorophyll biosynthetic process” and “response to blue light” among selection candidates. The species are clearly differentially ecologically adapted, but the extent to which this leads to post-zygotic RI due to reduced hybrid fitness remains an open question. Crossing the species (if indeed the species are capable of producing viable hybrids) and measuring the fitness of hybrids would improve our understanding of the speciation process.

While our RNA-seq dataset provided evidence that flowering time genes are under selection and gave insights into the phylogeny and population structure of the species, there are of course some caveats. While RNA-seq data is effective at providing sequence data for many protein coding genes, it also has pitfalls for identification of genes under selection. For example, regions of high differentiation can be megabases long [49], orders of magnitude longer than most transcripts. For this reason, it should be acknowledged that genes in our study showing signatures of selection are not necessarily all targets of selection themselves, as their high divergence could be the result of hitchhiking effects. Furthermore, since only expressed genes can be sequenced, we may have missed some genes under selection. This is of particular relevance because our sampling was conducted outside the flowering season for both species, so it is likely that some flowering time genes were missing from our dataset. Finally, RNA-seq data is highly gene-rich, and thus non-neutrally evolving. This makes it unsuitable for demographic modelling, since selection can severely bias demographic parameter estimates [58]. Demographic modelling would be of particular interest in these species, in particular to estimate the history of gene flow between them. An important next step in the study of these species would be to assemble a high-quality genome with resequencing data for multiple individuals. This would allow both more complete analyses of selection and demographic analyses. Nevertheless, the data we present here represents the first high-throughput sequencing data for these species and provides an important basis for future research.

Conclusion

Our results show that the two Metrosideros species of LHI offer a rare opportunity to study the late stages of speciation without geographic isolation. We have shown that flowering time is highly divergent between the species, that the species are both monophyletic, and our population genetic analyses suggest that flowering time divergence may be driven by selection on known flowering time genes. At which point during speciation, and by which evolutionary mechanisms, these differences evolved remains unclear, however. Unpicking these questions will likely provide important insights to plant speciation in general, given that the cessation of gene flow between newly evolved species within such a small geographic area appears to be rare in nature.

Acknowledgements

We thank Rishi De-Kayne for assistance in sample collection, Trevor Wilson, Greta Frankham and Rebecca Johnson for assistance with sample storage, Alexander Papadopulos and Adam Ciezerak for advice on analyses, the editor Kay Lucek and three anonymous referees for comments, the LHI Board and New South Wales Parks and Wildlife Service for permits, and the UK Natural Environment Research Council for funding.

Author contributions

VS, OGO and IH conceived the study, VS supervised the project, OGO and IH collected samples, IH, MPD, and OGO collected and collated flowering time data, OGO, TK and TB analysed the data, OGO wrote the initial draft and all authors contributed to the final version of the manuscript.

References

1.Nosil P, Harmon LJ, Seehausen O. 2009 Ecological explanations for (incomplete) speciation. Trends Ecol. Evol. 24, 145–156. (doi:10.1016/j.tree.2008.10.011)

2.Nosil P, Sandoval CP, Crespi BJ. 2006 The evolution of host preference in allopatric vs. parapatric populations of Timema cristinae walking-sticks. J. Evol. Biol. 19, 929–942. (doi:10.1111/j.1420-9101.2005.01035.x)

3.Coyne JA, Orr HA. 2004 Speciation. Sunderland, MA: Sinauer Associates Inc.

4.Baack E, Melo MC, Rieseberg LH, Ortiz-Barrientos D. 2015 The origins of reproductive isolation in plants. New Phytol. 207, 968–984. (doi:10.1111/nph.13424)

5.Lowry DB, Modliszewski JL, Wright KM, Wu CA, Willis JH. 2008 The strength and genetic basis of reproductive isolating barriers in flowering plants. Philos. Trans. R. Soc. B Biol. Sci. 363, 3009–3021. (doi:10.1098/rstb.2008.0064)

6.Servedio MR, Doorn GS Van, Kopp M, Frame AM, Nosil P. 2011 Magic traits in speciation: ‘magic’ but not rare? Trends Ecol. Evol. 26, 389–397. (doi:10.1016/j.tree.2011.04.005)

7.Butlin R. 1987 Speciation by reinforcement. Trends Ecol. Evol. 2, 8–13. (doi:10.1016/0169-5347(87)90193-5)

8.Silvertown J, Servaes C, Biss P, Macleod D. 2005 Reinforcement of reproductive isolation between adjacent populations in the Park Grass Experiment. Heredity 95, 198–205. (doi:10.1038/sj.hdy.6800710)

9.Savolainen V et al. 2006 Sympatric speciation in palms on an oceanic island. Nature 441, 210–213. (doi:10.1038/nature04566)

10.Hipperson H et al. 2016 Ecological speciation in sympatric palms: 2. Pre- and post-zygotic isolation. J. Evol. Biol. 29, 2143–2156. (doi:10.1111/jeb.12933)

11.Dunning LT et al. 2016 Ecological speciation in sympatric palms: 1. Gene expression, selection and pleiotropy. J. Evol. Biol. 29, 1472–1487. (doi:10.1111/jeb.12895)

12.Osborne OG, Ciezarek A, Wilson T, Crayn D, Hutton I, Baker WJ, Turnbull CGN, Savolainen V. 2019 Speciation in Howea palms occurred in sympatry, was preceded by ancestral admixture, and was associated with edaphic and phenological adaptation. Mol. Biol. Evol. 36, 2682–2697. (doi:10.1093/molbev/msz166)

13.Papadopulos AST, Baker WJ, Crayn D, Butlin RK, Kynast RG, Hutton I, Savolainen V. 2011 Speciation with gene flow on Lord Howe Island. Proc. Natl. Acad. Sci. 108, 13188–13193. (doi:10.1073/pnas.1106085108)

14.Papadopulos AST et al. 2014 Evaluation of genetic isolation within an island flora reveals unusually widespread local adaptation and supports sympatric speciation. Philos. Trans. R. Soc. B Biol. Sci. 369, 20130342. (doi:10.1098/rstb.2013.0342)

15.Papadopulos AST, Price Z, Devaux C, Hipperson H, Smadja CM, Hutton I, Baker WJ, Butlin RK, Savolainen V. 2013 A comparative analysis of the mechanisms underlying speciation on Lord Howe Island. J. Evol. Biol. 26, 733–745. (doi:10.1111/jeb.12071)

16.Macmanes MD. 2014 On the optimal trimming of high-throughput mRNA sequence data. Front. Genet. 5, 13. (doi:10.3389/fgene.2014.00013)

17.Andrews S. 2010 FastQC: a quality control tool for high throughput sequence data. Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc.

18.Song L, Florea L. 2015 Rcorrector: efficient and accurate error correction for Illumina RNA-seq reads. Gigascience 4, 48. (doi:10.1186/s13742-015-0089-y)

19.Bolger AM, Lohse M, Usadel B. 2014 Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120. (doi:10.1093/bioinformatics/btu170)

20.Liu J, Li G, Chang Z, Yu T, Liu B, Mcmullen R. 2016 BinPacker : Packing-Based De Novo Transcriptome Assembly from RNA-seq Data. PLoS Comput. Biol. 12, e1004772. (doi:10.1371/journal.pcbi.1004772)

21.Chang Z, Li G, Liu J, Zhang Y, Ashby C, Liu D, Cramer CL, Huang X. 2015 Bridger: a new framework for de novo transcriptome assembly using RNA-seq data. Genome Biol. 16, 30. (doi:10.1186/s13059-015-0596-2)

22.Peng Y, Leung HCM, Yiu SM, Lv MJ, Zhu XG, Chin FYL. 2013 IDBA-tran: A more robust de novo de Bruijn graph assembler for transcriptomes with uneven expression levels. Bioinformatics 29, 326–334. (doi:10.1093/bioinformatics/btt219)

23.Schulz MH, Zerbino DR, Vingron M, Birney E. 2012 Oases: Robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics 28, 1086–1092. (doi:10.1093/bioinformatics/bts094)

24.Kannan S, Hui J, Mazooji K. 2016 Shannon : An Information-Optimal de Novo RNA-Seq Assembler. bioRxiv (doi:http://dx.doi.org/10.1101/039230)

25.Xie Y et al. 2014 SOAPdenovo-Trans: De novo transcriptome assembly with short RNA-Seq reads. Bioinformatics 30, 1660–1666. (doi:10.1093/bioinformatics/btu077)

26.Robertson G et al. 2010 De novo assembly and analysis of RNA-seq data. Nat. Methods 7, 909–12. (doi:10.1038/nmeth.1517)

27.Grabherr MG. et al. 2013 Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data. Nat. Biotechnol. 29, 644–652. (doi:10.1038/nbt.1883.Trinity)

28.Haas BJ et al. 2013 De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 8, 1494–1512. (doi:10.1038/nprot.2013.084)

29.Fu L, Niu B, Zhu Z, Wu S, Li W. 2012 CD-HIT: Accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 3150–3152. (doi:10.1093/bioinformatics/bts565)

30.Davidson NM, Oshlack A. 2014 Corset: Enabling differential gene expression analysis for de novo assembled transcriptomes. Genome Biol. 15, 1–14. (doi:10.1186/s13059-014-0410-6)

31.Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. 2013 STAR: Ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21. (doi:10.1093/bioinformatics/bts635)

32.Goodstein DM et al. 2012 Phytozome: A comparative platform for green plant genomics. Nucleic Acids Res. 40, 1178–1186. (doi:10.1093/nar/gkr944)

33.Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL. 2009 BLAST+: Architecture and applications. BMC Bioinformatics 10, 1–9. (doi:10.1186/1471-2105-10-421)

34.Bouché F, Lobet G, Tocquin P, Périlleux C. 2016 FLOR-ID: An interactive database of flowering-time gene networks in Arabidopsis thaliana. Nucleic Acids Res. 44, D1167–D1171. (doi:10.1093/nar/gkv1054)

35.Raj A, Stephens M, Pritchard JK. 2014 FastSTRUCTURE: Variational inference of population structure in large SNP data sets. Genetics 197, 573–589. (doi:10.1534/genetics.114.164350)

36.Purcell S et al. 2007 PLINK: A tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575. (doi:10.1086/519795)

37.Weir BS, Cockerham CC. 1984 Estimating F-Statistics for the Analysis of Population Structure. Evolution 38, 1358–1370.

38.Nei M. 1987 Molecular Evolutionary Genetics. New York: Columbia University Press.

39.Tajima F. 1989 Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123, 585–595. (doi:PMC1203831)

40.Pfeifer B, Wittelsbürger U, Ramos-Onsins SE, Lercher MJ. 2014 PopGenome: An efficient swiss army knife for population genomic analyses in R. Mol. Biol. Evol. 31, 1929–1936. (doi:10.1093/molbev/msu136)

41.Cruickshank TE, Hahn MW. 2014 Reanalysis suggests that genomic islands of speciation are due to reduced diversity, not reduced gene flow. Mol. Ecol. 23, 3133–3157. (doi:10.1111/mec.12796)

42.Verity R, Collins C, Card DC, Schaal SM, Wang L, Lotterhos KE. 2017 minotaur: A platform for the analysis and visualization of multivariate results from genome scans with R Shiny. Mol. Ecol. Resour. 17, 33–43. (doi:10.1111/1755-0998.12579)

43.Lotterhos KE, Card DC, Schaal SM, Wang L, Collins C, Verity B. 2017 Composite measures of selection can improve the signal-to-noise ratio in genome scans. Methods Ecol. Evol. 8, 717–727. (doi:10.1111/2041-210X.12774)

44.Alexa A, Rahnenführer J, Lengauer T. 2006 Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics 22, 1600–1607. (doi:10.1093/bioinformatics/btl140)

45.Munson SM, Long AL. 2017 Climate drives shifts in grass reproductive phenology across the western USA. New Phytol. 213, 1945–1955. (doi:10.1111/nph.14327)

46.Pritchard JK, Stephens M, Donnelly P. 2000 Inference of population structure using multilocus genotype data. Genetics 155, 945–959.

47.Dumitru OA, Austermann J, Polyak VJ, Fornós JJ, Asmerom Y, Ginés J, Ginés A, Onac BP. 2019 Constraints on global mean sea level during Pliocene warmth. Nature 574, 233–236. (doi:10.1038/s41586-019-1543-2)

48.Martin CH, Cutler JS, Friel JP, Touokong CD, Coop G, Wainwright PC. 2015 Complex histories of repeated gene flow in Cameroon crater lake cichlids cast doubt on one of the clearest examples of sympatric speciation. Evolution 69, 1406–1422. (doi:10.1111/evo.12674)

49.Backström N, Sætre GP, Ellegren H. 2013 Inferring the demographic history of European Ficedula flycatcher populations. BMC Evol. Biol. 13, 2. (doi:10.1186/1471-2148-13-2)

50.Comeault AA, Flaxman SM, Schwander T, Curran E. 2015 Selection on a Genetic Polymorphism Counteracts Ecological Speciation in a Stick Insect Curr. Biol. 25, 1975–1981. (doi:10.1016/j.cub.2015.05.058)

51.Hopkins R, Rausher MD. 2012 Pollinator-mediated selection on flower color allele drives reinforcement. Science 335, 1090–1092. (doi:10.1126/science.1215198)

52.Hopkins R. 2013 Reinforcement in plants. New Phytol. 197, 1095–1103. (doi:10.1111/nph.12119)

53.Jordan CY, Ally D, Hodgins KA. 2015 When can stress facilitate divergence by altering time to flowering? Ecol. Evol. 5, 5962–5973. (doi:10.1002/ece3.1821)

54.Tavares H et al. 2018 Selection and gene flow shape genomic islands that control floral guides. Proc. Natl. Acad. Sci. 115, 11006–11011. (doi:10.1073/pnas.1801832115)

55.Blackman BK. 2017 Changing responses to changing seasons: Natural variation in the plasticity of flowering time. Plant Physiol. 173, 16–26. (doi:10.1104/pp.16.01683)

56.Meng LS, Wang ZB, Yao SQ, Liu A. 2015 The ARF2-ANT-COR15A gene cascade regulates ABA-signaling mediated resistance of large seeds to drought in Arabidopsis. J. Cell Sci. 128, 3922–3932. (doi:10.1242/jcs.171207)

57.Urano K, Yoshiba Y, Nanjo T, Ito T, Yamaguchi-Shinozaki K, Shinozaki K. 2004 Arabidopsis stress-inducible gene for arginine decarboxylase AtADC2 is required for accumulation of putrescine in salt tolerance. Biochem. Biophys. Res. Commun. 313, 369–375. (doi:10.1016/j.bbrc.2003.11.119)

58.Schrider DR, Shanku AG, Kern AD. 2016 Effects of linked selective sweeps on demographic inference and model selection. Genetics 204, 1207–1223. (doi:10.1534/genetics.116.190223)

Figure legends

Figure 1. Sampling locations and species morphology. A map of Lord Howe Island, Australia (A), shows sampling locations of all sequenced samples. The map is coloured by elevation and whether samples were growing in allopatry or sympatry is indicated in the legend. Sample codes on the map refer to those in Table S1. Photographs show Metrosideros nervulosa (B) and M. sclerocarpa (C), growing in situ on Lord Howe Island.

Figure 2. Population structure of 20 Metrosideros individuals. (A) There is a clear partitioning of Metrosideros nervulosa (red) and M. sclerocarpa (blue) along the first MDS dimension. (B) No intraspecific population structure or admixture was inferred from a genetic clustering analysis at the most likely number of clusters (K = 2). Each individual is represented by a vertical bar which is partitioned into two colours representing estimated membership of the two clusters. (C) A rooted phylogenetic tree of all individuals. Labels for M. nervulosa individuals are shown in red and those for M. sclerocarpa individuals are shown in blue. The scale bar shows substitutions per site. Node shading and legend show bootstrap support. Both species are monophyletic with 100% bootstrap support.

Figure 3. Flowering time differences between Metrosideros nervulosa and M. sclerocarpa. Red and blue solid lines show densities of flowering times for M. nervulosa and M. sclerocarpa, respectively, in units of days after July 1st. Dashed lines show the median flowering time for each species.