Miniaturized mitogenome of the parasitic plant …Miniaturized mitogenome of the parasitic plant...

10
Miniaturized mitogenome of the parasitic plant Viscum scurruloideum is extremely divergent and dynamic and has lost all nad genes Elizabeth Skippington a , Todd J. Barkman b , Danny W. Rice a , and Jeffrey D. Palmer a,1 a Department of Biology, Indiana University, Bloomington, IN 47405; and b Department of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008 Edited by David M. Hillis, The University of Texas at Austin, Austin, TX, and approved June 1, 2015 (received for review March 5, 2015) Despite the enormous diversity among parasitic angiosperms in form and structure, life-history strategies, and plastid genomes, little is known about the diversity of their mitogenomes. We report the sequence of the wonderfully bizarre mitogenome of the hemiparasitic aerial mistletoe Viscum scurruloideum. This ge- nome is only 66 kb in size, making it the smallest known angio- sperm mitogenome by a factor of more than three and the smallest land plant mitogenome. Accompanying this size reduc- tion is exceptional reduction of gene content. Much of this reduc- tion arises from the unexpected loss of respiratory complex I (NADH dehydrogenase), universally present in all 300+ other an- giosperms examined, where it is encoded by nine mitochondrial and many nuclear nad genes. Loss of complex I in a multicellular organism is unprecedented. We explore the potential relationship between this loss in Viscum and its parasitic lifestyle. Despite its small size, the Viscum mitogenome is unusually rich in recombina- tionally active repeats, possessing unparalleled levels of predicted sublimons resulting from recombination across short repeats. Many mitochondrial gene products exhibit extraordinary levels of divergence in Viscum, indicative of highly relaxed if not positive se- lection. In addition, all Viscum mitochondrial protein genes have experienced a dramatic acceleration in synonymous substitution rates, consistent with the hypothesis of genomic streamlining in response to a high mutation rate but completely opposite to the pattern seen for the high-rate but enormous mitogenomes of Silene. In sum, the Viscum mitogenome possesses a unique con- stellation of extremely unusual features, a subset of which may be related to its parasitic lifestyle. mitogenome | mutation rate | complex I | parasitic plants | genome reduction P arasitism has evolved at least 12 or 13 times in angiosperms, with parasitic plants comprising about 1% of known angio- sperm species (1, 2). Parasitic angiosperms vary enormously in size, life history, anatomy, physiological adaptation to a parasitic lifestyle, and nutritional dependence on host plants (3). Plastid genomes have been sequenced from many parasitic species and lineages and vary enormously in size and gene content, from those that are typical of nonparasitic photosynthetic plants to Rafflesia lagascae, which may not even have a plastid genome (46). Despite this great diversity, the mitogenomes of parasitic plants are largely unexplored. Most studies of parasitic plant mtDNAs have dealt primarily with their apparent propensity for uptake of foreign DNA via horizontal gene transfer (HGT). HGT is relatively common in angiosperm mitogenomes, and is especially common in parasitic plants, in which HGT is facilitated by the intimate physical connection between parasites and their host plant (1, 2, 7). Rafflesiaceae, a small family of endophytic holoparasites fa- mous for possessing the largest flowers in the world, is the only parasitic plant clade for which large-scale mitochondrial genomic sequencing has been undertaken (4, 8). Sequences of 38 genes common to three Rafflesiaceae species and two host species revealed that host-to-parasite HGT has been frequent in Raf- flesiaceae mitogenomes, which otherwise are relatively unremark- able with respect to gene content and sequence divergence (8). Depending on the Rafflesiaceae species, 2441% of protein genes are inferred to have been acquired by HGT. The repetitive nature of Rafflesiaceae mtDNAs and the short reads used in these studies rendered assembly of complete genome sequences impractical, but with a minimum size of 320 kb, the Rafflesia lagascae mito- genome (4) falls within the known angiosperm size range (0.211.3 Mb) (9, 10). To help remedy the lack of knowledge of parasitic plant mitogenomes, we selected the Santalales for comparative se- quencing for two reasons. First, the Santalales is the largest group of parasitic plants (with more than 2,000 parasitic species) and is also highly diverse (11, 12). This diversity is manifest in terms of autotrophic autonomy, with members of the order ranging from free-living nonparasitic trees to green, photo- synthesizing hemiparasites to highly derived species that lack leaves and maintain only minimal photosynthesis at narrow points in the life cycle. The order also includes holoparasites (completely nonphotosynthetic heterotrophs) if further phylo- genetic study confirms the tentative placement of the bizarre holoparasitic family Balanophoraceae as sister to, if not a member of, Santalales (2, 13). Another important aspect of this diversity is that Santalales have undergone multiple transitions from root parasitism (the ancestral parasitic condition in the group) to aerial parasitism, with these mistletoesrepresenting Significance The mitochondrial genomes of flowering plants are character- ized by an extreme and often perplexing diversity in size, or- ganization, and mutation rate, but their primary genetic function, in respiration, is extremely well conserved. Here we present the mitochondrial genome of an aerobic parasitic plant, the mistletoe Viscum scurruloideum. This genome is miniaturized, shows clear signs of rapid and degenerative evolution, and lacks all genes for complex I of the respiratory electron-transfer chain. To our knowledge, this is the first re- port of the loss of this key respiratory complex in any multi- cellular eukaryote. The Viscum mitochondrial genome has taken a unique overall tack in evolution that, to some extent, likely reflects the progression of a specialized parasitic lifestyle. Author contributions: E.S. and J.D.P. designed research; E.S., T.J.B., and D.W.R. performed research; D.W.R. contributed new reagents/analytic tools; E.S., T.J.B., D.W.R., and J.D.P. analyzed data; and E.S. and J.D.P. wrote the paper. The authors declare no conflict of interest. This article is a PNAS Direct Submission. Data deposition: The sequence reported in this paper has been deposited in the GenBank database, www.ncbi.nlm.nih.gov/genbank/ (accession nos. KT022222 and KT022223). 1 To whom correspondence should be addressed. Email: [email protected]. This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 1073/pnas.1504491112/-/DCSupplemental. www.pnas.org/cgi/doi/10.1073/pnas.1504491112 PNAS | Published online June 22, 2015 | E3515E3524 EVOLUTION PNAS PLUS

Transcript of Miniaturized mitogenome of the parasitic plant …Miniaturized mitogenome of the parasitic plant...

Page 1: Miniaturized mitogenome of the parasitic plant …Miniaturized mitogenome of the parasitic plant Viscum scurruloideum is extremely divergent and dynamic and has lost all nad genes

Miniaturized mitogenome of the parasitic plantViscum scurruloideum is extremely divergent anddynamic and has lost all nad genesElizabeth Skippingtona, Todd J. Barkmanb, Danny W. Ricea, and Jeffrey D. Palmera,1

aDepartment of Biology, Indiana University, Bloomington, IN 47405; and bDepartment of Biological Sciences, Western Michigan University, Kalamazoo,MI 49008

Edited by David M. Hillis, The University of Texas at Austin, Austin, TX, and approved June 1, 2015 (received for review March 5, 2015)

Despite the enormous diversity among parasitic angiosperms inform and structure, life-history strategies, and plastid genomes,little is known about the diversity of their mitogenomes. Wereport the sequence of the wonderfully bizarre mitogenome ofthe hemiparasitic aerial mistletoe Viscum scurruloideum. This ge-nome is only 66 kb in size, making it the smallest known angio-sperm mitogenome by a factor of more than three and thesmallest land plant mitogenome. Accompanying this size reduc-tion is exceptional reduction of gene content. Much of this reduc-tion arises from the unexpected loss of respiratory complex I(NADH dehydrogenase), universally present in all 300+ other an-giosperms examined, where it is encoded by nine mitochondrialand many nuclear nad genes. Loss of complex I in a multicellularorganism is unprecedented. We explore the potential relationshipbetween this loss in Viscum and its parasitic lifestyle. Despite itssmall size, the Viscum mitogenome is unusually rich in recombina-tionally active repeats, possessing unparalleled levels of predictedsublimons resulting from recombination across short repeats.Many mitochondrial gene products exhibit extraordinary levels ofdivergence in Viscum, indicative of highly relaxed if not positive se-lection. In addition, all Viscum mitochondrial protein genes haveexperienced a dramatic acceleration in synonymous substitutionrates, consistent with the hypothesis of genomic streamlining inresponse to a high mutation rate but completely opposite to thepattern seen for the high-rate but enormous mitogenomes ofSilene. In sum, the Viscum mitogenome possesses a unique con-stellation of extremely unusual features, a subset of which may berelated to its parasitic lifestyle.

mitogenome | mutation rate | complex I | parasitic plants |genome reduction

Parasitism has evolved at least 12 or 13 times in angiosperms,with parasitic plants comprising about 1% of known angio-

sperm species (1, 2). Parasitic angiosperms vary enormously insize, life history, anatomy, physiological adaptation to a parasiticlifestyle, and nutritional dependence on host plants (3). Plastidgenomes have been sequenced from many parasitic species andlineages and vary enormously in size and gene content, fromthose that are typical of nonparasitic photosynthetic plants toRafflesia lagascae, which may not even have a plastid genome (4–6). Despite this great diversity, the mitogenomes of parasiticplants are largely unexplored. Most studies of parasitic plantmtDNAs have dealt primarily with their apparent propensity foruptake of foreign DNA via horizontal gene transfer (HGT). HGTis relatively common in angiosperm mitogenomes, and is especiallycommon in parasitic plants, in which HGT is facilitated by theintimate physical connection between parasites and their hostplant (1, 2, 7).Rafflesiaceae, a small family of endophytic holoparasites fa-

mous for possessing the largest flowers in the world, is the onlyparasitic plant clade for which large-scale mitochondrial genomicsequencing has been undertaken (4, 8). Sequences of 38 genescommon to three Rafflesiaceae species and two host species

revealed that host-to-parasite HGT has been frequent in Raf-flesiaceae mitogenomes, which otherwise are relatively unremark-able with respect to gene content and sequence divergence (8).Depending on the Rafflesiaceae species, 24–41% of protein genesare inferred to have been acquired by HGT. The repetitive natureof Rafflesiaceae mtDNAs and the short reads used in these studiesrendered assembly of complete genome sequences impractical,but with a minimum size of 320 kb, the Rafflesia lagascae mito-genome (4) falls within the known angiosperm size range (0.2–11.3 Mb) (9, 10).To help remedy the lack of knowledge of parasitic plant

mitogenomes, we selected the Santalales for comparative se-quencing for two reasons. First, the Santalales is the largestgroup of parasitic plants (with more than 2,000 parasitic species)and is also highly diverse (11, 12). This diversity is manifest interms of autotrophic autonomy, with members of the orderranging from free-living nonparasitic trees to green, photo-synthesizing hemiparasites to highly derived species that lackleaves and maintain only minimal photosynthesis at narrowpoints in the life cycle. The order also includes holoparasites(completely nonphotosynthetic heterotrophs) if further phylo-genetic study confirms the tentative placement of the bizarreholoparasitic family Balanophoraceae as sister to, if not amember of, Santalales (2, 13). Another important aspect of thisdiversity is that Santalales have undergone multiple transitionsfrom root parasitism (the ancestral parasitic condition in thegroup) to aerial parasitism, with these “mistletoes” representing

Significance

The mitochondrial genomes of flowering plants are character-ized by an extreme and often perplexing diversity in size, or-ganization, and mutation rate, but their primary geneticfunction, in respiration, is extremely well conserved. Here wepresent the mitochondrial genome of an aerobic parasiticplant, the mistletoe Viscum scurruloideum. This genome isminiaturized, shows clear signs of rapid and degenerativeevolution, and lacks all genes for complex I of the respiratoryelectron-transfer chain. To our knowledge, this is the first re-port of the loss of this key respiratory complex in any multi-cellular eukaryote. The Viscum mitochondrial genome hastaken a unique overall tack in evolution that, to some extent,likely reflects the progression of a specialized parasitic lifestyle.

Author contributions: E.S. and J.D.P. designed research; E.S., T.J.B., and D.W.R. performedresearch; D.W.R. contributed new reagents/analytic tools; E.S., T.J.B., D.W.R., and J.D.P.analyzed data; and E.S. and J.D.P. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

Data deposition: The sequence reported in this paper has been deposited in the GenBankdatabase, www.ncbi.nlm.nih.gov/genbank/ (accession nos. KT022222 and KT022223).1To whom correspondence should be addressed. Email: [email protected].

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1504491112/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1504491112 PNAS | Published online June 22, 2015 | E3515–E3524

EVOLU

TION

PNASPL

US

Page 2: Miniaturized mitogenome of the parasitic plant …Miniaturized mitogenome of the parasitic plant Viscum scurruloideum is extremely divergent and dynamic and has lost all nad genes

the majority of aerial parasitic plants (14). Second, we recentlyshowed (15) that the Santalales are one of the principal donorsof foreign mtDNA to the amazingly HGT-rich mitogenome ofthe basal angiosperm Amborella trichopoda. Accordingly, wewished to determine the number and identity of the Santalalesdonors and, to the extent possible, the timing and mechanism ofthese transfers.Here we present the mitogenome sequence of the hemi-

parasitic mistletoe Viscum scurruloideum. Unlike Rafflesiaceaemitochondrial genomes, this Viscum genome shows little evi-dence of HGT. However, it is highly unusual in a number ofother ways, including genome size, mutation rate, selectivepressure, levels of repeat-mediated recombination and sub-limons, and the loss of many genes, including the entire suite ofnine mitochondrial genes encoding respiratory complex I. Weconsider the implications of the unprecedented (for multicellularorganisms) loss of complex I and Viscum’s host dependence.

Results and DiscussionThe Smallest Angiosperm Mitochondrial Genome. The Viscum mito-genome assembly consists of two contigs, a circular contig42,186 bp in length and a linear contig 23,687 bp in length (Fig.1A). High-depth contigs (>80×) were constructed from a short(100-bp) paired-end sequencing library (SI Appendix, Fig. S1).Searches for potentially missing mitochondrial sequences didnot produce any similarly high-depth contigs (Materials andMethods). The Viscum genome was compared with 33 genomeschosen to sample the phylogenetic diversity of sequenced an-giosperm mitogenomes (Fig. 2). With the exception of Silene, forwhich four species were included to capture the remarkableheterogeneity in genome size and mutation rate within the genus(9), only one species is represented per genus, and at most twoper family are included. Angiosperm mitogenomes are excep-tional for their large and variable sizes (9, 10). At 66 kb, Viscumis the smallest (by a factor of 3.3) angiosperm mitogenome se-quenced to date (Fig. 2), is smaller than any other land plantgenome (the next smallest is the moss Bauxbaumia at 101 kb),and among charophytes (the green algal ancestors of land plants)is closest in size to some of the smallest known genomes, those ofEntransia (62 kb) and Chara (68 kb) in particular. Despite itssmall size, Viscum’s guanine-cytosine content of 47.4% is un-remarkable compared with that of other angiosperms, whichgenerally fall in the range of 43–45% (10).Between V. scurruloideum and the astoundingly large mito-

genome of Silene conica (11.3 Mb), the known range of angio-sperm mitogenome sizes is now on the order of 200-fold. Instriking contrast to the highly reduced mitogenome of Viscum,the four species of Viscum examined thus far have exceptionallylarge nuclear genomes ranging in size from 2C = 122–201 Gb(16). These are the largest nuclear genome sizes reported foreudicots, a group that contains more than 70% of the ca. 350,000species of angiosperms. No information is available on Viscumplastid genomes.

Dramatically Reduced Core Coding Capacity and Intron Content.Angiosperm mitogenomes virtually always share the same setof 24 core protein genes, mostly coding for respiratory proteins,but often differ substantially in their inventories of a further17 protein genes mostly encoding ribosomal proteins (17, 18).Three species of Silene possess the most protein-gene–poor an-giosperm mitogenomes described to date (9). Two of thesespecies share the entire core set of 24 protein genes, and thethird contains 23 of these genes, with its loss of ccmFc the onlyknown case of such loss in angiosperms. Conversely, the threeSilenes have lost all but one or two of the 17 genes comprising thevariable set (Fig. 2). The Viscum mitogenome presents a dra-matic departure with respect to the core genes. It has lost theentirety of 11 of the 24 core protein genes. These losses include

ccmB, matR, and all nine nad genes (nad1, nad2, nad3, nad4,nad4L, nad5, nad6, nad7, and nad9) (Fig. 2). To check for thepossibility that these genes fall within an assembled portion ofmtDNA, total genomic reads were searched for evidence ofhomology with these genes using BLAST. Because 95% of ourmitochondrial assembly has a read depth greater than 80 (SIAppendix, Fig. S1), we expect mitochondrial reads matchingthese genes also to give depths greater than 80. Although eight ofthese 11 genes yielded a smattering of matches, the low readdepths do not support the conclusion that these reads are ofmitochondrial origin (SI Appendix, SI Discussion). Because theloss of these three classes of genes is highly unexpected—all 11genes are found in all 300+ other angiosperm mitogenomes ex-amined (Fig. 2) (10, 17)—they are discussed in separate sectionsbelow. The Viscum genome also has lost 11 of the 17 variably

Fig. 1. The mitochondrial genome of V. scurruloideum. (A) Gene map, withgene orientation shown for all genes ≥200 nt in length. Note that the largerof the two contigs is circular but is shown as arbitrarily linearized. The dottedline indicates probable trans-splicing of the first intron in the cox2 gene.(B) Repeat map, for both contigs, with curved lines connecting pairs of repeatsand with line width proportional to repeat size. Repeats larger than the libraryfragment size (800 bp) are in gray, repeats with one or more read pairs con-sistent with a rearrangement event across the repeat are in blue, and theremaining repeats are in red. Boxes on the outer ring mark the positions ofgenes, colored as in A. The numbers give genome coordinates in kilobases.

E3516 | www.pnas.org/cgi/doi/10.1073/pnas.1504491112 Skippington et al.

Page 3: Miniaturized mitogenome of the parasitic plant …Miniaturized mitogenome of the parasitic plant Viscum scurruloideum is extremely divergent and dynamic and has lost all nad genes

present protein genes, with nine of these genes missing entirelyand the other two present as obvious pseudogenes (Fig. 2). Al-together, only 19 intact and potentially functional protein geneswere identified in the genome assembly. This protein-codingcapacity is the smallest identified in an angiosperm mitogenome,five genes fewer than the already highly reduced capacity in Silene.Three sequences were identified that could be folded into the

typical tRNA structure (SI Appendix, Fig. S2). These sequences,which correspond to trnW(cca), trnfM (cau), and trnK (uuu),have well-conserved anticodon and dihydrouridine stem se-quences but contain mismatches in other regions of the sec-ondary structure. Although the Viscum mitogenome contains farfewer tRNA genes than most other angiosperms, similarly re-duced tRNA gene content (two and three, respectively) has beenreported for the S. conica and Silene noctiflora mitogenomes (SIAppendix, Fig. S3) (9).Large and small subunit rRNA (LSU rRNA and SSU rRNA,

respectively) genes were identified also. Although both rRNAgene sequences are exceptionally divergent (Exceptional Di-vergence of Viscum Protein and rRNA Genes), superimpositiononto the Oenothera berteriana mitochondrial SSU rRNA and Zeamays mitochondrial LSU rRNA secondary structures (SI Ap-pendix, Figs. S4 and S5, respectively) revealed significant con-servation at the secondary structure level, indicating that thesegenes are still functional in Viscum. Secondary structure andphylogenetic analyses were not carried out using the 5S rRNAbecause of its short length (<100 nt), but a pairwise alignment ofits sequence to the Liriodendron 5S rRNA yielded 87% identityand 5% gaps.The ancestral angiosperm mitogenome contained 25 group II

introns (19). Most angiosperm genomes examined contain nearlythis entire set (SI Appendix, Fig. S6), and none have fewer than

18 of these introns (9). Viscum, however, contains only threegroup II introns (SI Appendix, Fig. S6). Most of this reduction inintron number can be attributed to the loss of the nad genes,which commonly possess 19 group II introns in seed plants (SIAppendix, Fig. S6). In addition, Viscum has lost rpl2, whichnormally contains a single group II intron, and a single intron hasbeen lost from each of the otherwise intact genes rps3 and rps10.In addition, the first intron in the Viscum cox2 gene is probablytrans-spliced, given the physical separation of the first exon fromthe rest of the gene (Fig. 1A) and that the exonic boundaries ofthese two regions do not contain stop and start codons, re-spectively, as expected under the alternative model of gene fis-sion. Trans-splicing in this intron is known to have arisen in twoother lineages of vascular plants (20, 21).The loss of so many genes and their (mostly accompanying)

introns accounts for an appreciable fraction (ca. 25%) of the sizereduction in the 66-kb Viscum mitogenome compared with thenext smallest sequenced angiosperm mitogenome (222 kb) in ourcomparative dataset (Fig. 2). Most of the loss in gene space is aconsequence of the loss of the intron-rich nad genes, whichtypically comprise nearly 40 kb of DNA, or more than half thegene space of normal angiosperm mitogenomes (SI Appendix,Fig. S7). The rest of the Viscum size reduction is a consequenceof a massive reduction in the amount of intergenic spacer DNA,which totals ca. 36 kb in Viscum compared with ca. 150 kb in the222-kb Brassica napus genome and more than 11 Mb in S. conica.

Loss of Mitochondrial Maturase Gene matR. The loss of the maturasegene matR could reflect a very rare case of functional transferto the nucleus. However, because matR resides within nad1intron 4, it is more likely that matR was lost along within theentire suite of mitochondrial nad genes in Viscum. MatR is the

Fig. 2. Genome size and protein-gene content of 34 sequenced angiosperm mitochondrial genomes and outgroup Cycas taitungensis. Intact genes are darkgray, pseudogenes are light gray, and absent genes are white. The phylogeny shown is the current best estimate of relationships among these plants (ref. 78and references therein).

Skippington et al. PNAS | Published online June 22, 2015 | E3517

EVOLU

TION

PNASPL

US

Page 4: Miniaturized mitogenome of the parasitic plant …Miniaturized mitogenome of the parasitic plant Viscum scurruloideum is extremely divergent and dynamic and has lost all nad genes

only mitochondrial-specified maturase homolog present in an-giosperms, and although the precise function of MatR in mito-chondria is uncertain, it is clearly homologous to proteins knownto assist in group II intron splicing in other mitochondrial genomes(22). Furthermore, preliminary data suggest that MatR binds toseveral group II introns in vivo (23). If MatR is indeed a poly-valent splicing factor, the precise consequences of the loss of thisgene for the group II intron-splicing process in Viscum need tobe elucidated.

Possible Rare Transfer of ccmB to the Nucleus. In contrast to thematR situation, functional transfer to the nucleus is the bestexplanation for the absence of ccmB from Viscum mtDNA. Inmost plants, mitochondrial biogenesis of cytochrome c is carriedout via a maturation pathway encoded by four mitochondrialccm genes (ccmB, ccmC, ccmFc, and ccmFn) and several nucleargenes (24). Because ccmC, ccmFc, and ccmFn are all present asintact genes in the Viscum mitogenome, albeit in moderately toexceptionally divergent forms (Table 1), the ccm complex is al-most certainly functional in Viscum, and therefore ccmB prob-ably has been functionally located to the nucleus. CcmB is ahydrophobic protein with six transmembrane domains that tra-verse the inner mitochondrial membrane (24). Although func-tional transfer of ccmB has not been reported previously in plants(Fig. 2) (17), if this gene experienced anything approaching ccmFc-and ccmFn-like levels of divergence while in the mitogenome, thisdivergence may have altered the hydrophobicity of CcmB to anextent that allowed it to be imported more readily into the mito-chondrion following ccmB functional transfer to the nucleus. Sucha transfer would be analogous to other cases of rare mitochon-drial-to-nuclear gene transfer, which have been accompanied by areduction in hydrophobicity of the encoded protein (25, 26). Un-fortunately, nuclear coverage in our sequence data (ca. 0.1×;Materials and Methods and SI Appendix, SI Discussion) is inade-quate to detect a full-length transferred copy of ccmB, and thereforetesting of the transfer hypothesis must await nuclear transcriptomeor genome sequencing.

Unprecedented Loss of Complex I in a Multicellular Organism. Theloss of all mitochondrial nad genes in Viscum is of special interestbecause it very likely corresponds to the loss of an entire respiratorycomplex, something that has never been reported before for anyplant or alga, including the holoparasite Rafflesia (4). The nad genescontain five or six trans-spliced introns in angiosperms, with the 14or 15 nad transcription units scattered essentially randomly in therapidly rearranging mitogenomes of angiosperms (10, 19). There-fore, the eradication of all traces of nad genes from the Viscummitogenome probably occurred via many separate deletions, pre-sumably occurring over some period of time. This likelihood,combined with the presence of pseudogenes for other “lost” genes(i.e., rps1 and sdh3), suggests that the loss of selection on complex Imay have occurred much earlier than the loss of rps1 and sdh3.ATP formation in mitochondria occurs mainly through oxi-

dative phosphorylation. This is achieved by a process of electrontransfer through an assemblage of more than 20 carriers that areorganized into complexes I–IV of the electron transport chain(27). With the notable exceptions of four lineages, all respiringeukaryotes contain complex I, a multisubunit, rotenone-sensitiveNADH dehydrogenase that is encoded by 5–12 mitochondrialgenes and a few dozen nuclear nad genes. The four exceptions, inwhich all mitochondrial and nuclear nad genes have vanished,include two distinct lineages of yeasts (28), the cryptomycotanfungus Rozella (29), and the lineage comprising dinoflagellates,apicomplexans, Chromera, and Oxyrrhis (30). That these are allunicellular makes the case in Viscum particularly notable.The loss of the entire suite of mitochondrial nad genes in

Viscum almost certainly reflects the loss of complex I itself, asopposed to the functional transfer of all these genes to the nu-

cleus. Computational searches for “nad-like” reads within thetotal genomic DNA yielded only a small number of matches,most of which corresponded to intronic sequences that are un-likely to be properly expressed in the nucleus (SI Appendix, SIDiscussion). Seven of the nine nad genes found in all 300+ otherangiosperm mitogenomes examined (Fig. 2) (9, 17) have rarely,if ever, been found functionally transferred to the nucleus innonangiosperms. Indeed, it is these seven nad genes that arefound in all examined animal mitogenomes. Three of the seven(nad1, nad4, and nad5) are universally present in all mitoge-nomes of eukaryotes that retain complex I, whereas nad2, nad3,nad4L, and nad6 have been reported lost from the mitochondrialgenome (and presumably functionally transferred to the nucleus)only one to three times across eukaryotes (31). The difficulty offunctionally transferring these seven nad genes to the nucleus isprobably the result of severe mechanistic challenges to the suc-cessful transport of their hydrophobic products across the outermitochondrial membrane and insertional assembly within theinner membrane (32, 33). Therefore, even in the absence of anymeaningful information on nad gene status in the nucleus of V.scurruloideum, the loss of all nine nad genes from the V. scur-ruloideum mitogenome most likely reflects the loss of complex Ifrom this plant.The losses of complex I in two yeast lineages have been ac-

companied by duplications of nuclear-encoded alternative NADHdehydrogenases that bypass complex I and oxidize NADH withoutproton translocation (34). Because these alternative means ofregenerating NAD+ are unlinked to proton pumping in the mito-chondrion, they generate less energy in the form of ATP. Althoughthe evolutionary loss of complex I is unprecedented in plants, mu-tant lines are informative for understanding the potential conse-quences of the loss of the nad genes in Viscum. In particular, threemutant lines, Nicotiana sylvestris CMSII lacking the mitochondrialnad7 gene (35, 36), Arabidopsis thaliana ndufs4 lacking a nuclearnad gene (37), and NCS2 maize plants lacking functional nad4, are

Table 1. Summary of Viscum protein-gene divergence

Gene

Codingsequencelength, nt % amino acid identity*

dN dS dN/dSViscum Lirio† Viscum vs. Lirio† Vitis vs. Lirio†

cox1 1,611 1,572 83 98 0.11 0.77 0.14atp9 219 225 81 93 0.05 0.68 0.07atp1 1,533 1,527 72 95 0.19 1.04 0.18atp6 912 720 71 95 0.21 0.91 0.24cox2 750 762 69 90 0.26 1.01 0.25cob 1,221 1,179 69 97 0.19 1.05 0.18cox3 795 795 66 97 0.22 0.84 0.27rps12 411 375 61 95 0.31 3.89 0.08ccmC 654 699 56 91 0.39 1.06 0.37mttB 789 720 46 95ccmFn 1,467 1,683 43 92rps10 273 357 42 96rpl16 378 432 44 99atp8 483 474 41 96rps4 1,224 1,035 39 93ccmFc 1,215 1,326 38 87sdh4 390 387 34 88rps3 1,518 1,554 32 93atp4 474 558 29 93

Synonymous and nonsynonymous (dS and dN, respectively) divergenceand dN/dS ratios are given for the nine best-conserved mitochondrial proteingenes of Viscum.*Identity calculation excludes gap positions.†Liriodendron tulipifera.

E3518 | www.pnas.org/cgi/doi/10.1073/pnas.1504491112 Skippington et al.

Page 5: Miniaturized mitogenome of the parasitic plant …Miniaturized mitogenome of the parasitic plant Viscum scurruloideum is extremely divergent and dynamic and has lost all nad genes

deficient in complex I (38, 39). Like the complex-I–deficient yeasts,these plant mutants remain viable via use of alternative NADHdehydrogenases (37, 38, 40), albeit with constitutively lower mito-chondrial phosphorylation efficiencies. Plants in general have theability to oxidize NADH and NADPH via several nuclear-encodedalternative dehydrogenases (41). The determination of which ofthese dehydrogenases Viscum uses to compensate for the lack ofcomplex I awaits experimental investigation and sequencing of itshuge nuclear genome.The plant complex I mutants grow more slowly than their

respective wild types and have other deficiencies: N. sylvestrisCMSII mutants exhibit male sterility and slow growth (36);A. thaliana ndufs4 shows depressed germination and growth (37);and homoplasmic NC2 maize plants exhibit sterility and severelydepressed growth (42). Because Viscum does not exhibit no-ticeably defective germination or growth, regardless of any del-eterious effects caused by loss of complex I in Viscum, theorganism has been able to adapt and thrive. Analysis of theCMSII mutant has shown that mitochondrial complex I is es-sential for optimal photosynthetic performance (43). In partic-ular, steady-state photosynthesis in CMSII is consistently reducedby 20–30% at atmospheric CO2 levels, suggesting that this com-plex I mutant is not able to use its photosynthetic capacity as wellas wild-type N. sylvestris despite the action of alternative de-hydrogenases (40, 44). It has long been known that mistletoestypically have lower photosynthetic rates than their hosts (45–48).Thus, the lower rates of photosynthesis and the loss of complex Imay be interrelated in Viscum.The precise molecular interactions underpinning the re-

lationship between reduced photosynthetic efficiency and themitochondrial electron transport chain have not been described,but it is notable that under conditions in which CO2 fixation(and, hence, sucrose synthesis) is enhanced, photosynthesis inCMSII is less inhibited relative to wild-type N. sylvestris. Thisobservation suggests that the decrease in photosynthesis inCMSII under atmospheric CO2 levels is not caused by in-sufficient mitochondrial ATP production for sucrose synthesisand is specific to conditions in which the enzymes necessary forphotosynthesis are preoccupied with another task (44). We hy-pothesize that Viscum uses alternative dehydrogenases to com-pensate for the loss of complex I in maintaining mitochondrialATP production but suffers an associated reduction in photo-synthetic efficiency. Whether cause or cure, parasitism couldcompensate for some loss of photosynthetic ability.

Recent HGT of the cox1 Intron.We find little evidence suggestive ofHGT in Viscum mtDNA. First, there is the lack of multiple,divergent copies of any genes in the Viscum mitogenome. Incontrast, most cases of plant mitochondrial HGT reported thusfar have resulted in the maintenance of both horizontally ac-quired and native gene copies within the mitogenome, withRafflesiaceae and Amborella especially notable in this regard (8,15). Second, the extreme divergence of all 21 Viscum protein andrRNA genes (see the next section and SI Appendix, Fig. S8),which confounds phylogenetic detection of HGT, also meansthat any transfers must be relatively ancient, occurring before theaccumulation of a significant fraction of this divergence. Finally,no evidence of gene conversion was found in any Viscum genesusing OrgConv with Bonferroni correction for multiple tests (49).The only clear evidence of HGT in the Viscum mitogenome

involves its sole group I intron, located in the cox1 gene, whichalso is the only group I intron found in any angiosperm mito-genome. This intron has been acquired more or less recently inmany angiosperm lineages by hundreds of mitochondrial-to-mi-tochondrial horizontal transfer events (2, 50, 51). In strikingcontrast to the extremely divergent gene sequences in Viscum (SIAppendix, Fig. S8), including that of cox1 itself, the Viscum cox1intron is unexceptional in its level of sequence divergence com-

pared with other angiosperm cox1 introns (SI Appendix, Fig.S9A). This contrast strongly implies that the intron was acquiredrecently in the Viscum lineage, most likely independently of in-tron acquisitions in Lepionurus and Dendrophthoe, the two othermembers of the Santalales known to possess the intron (SI Ap-pendix, SI Discussion). The Viscum cox1 gene contains an un-usually lengthy region of putatively converted exonic sequence(known as a coconversion tract), as is also consistent with a re-cent horizontal acquisition (SI Appendix, Figs. S9B and S10).

Exceptional Divergence of Viscum Protein and rRNA Genes. Phylo-genetic analysis (SI Appendix, Fig. S8) of two of the three rRNAgenes and the nine best-conserved protein genes in Viscum showsthat they are even more divergent, often considerably so, thangenes in the fast-evolving S. conica and S. noctiflora mitoge-nomes (9). Moreover, the rRNA trees seriously underestimatethe extent of rRNA divergence (SI Appendix, legend of Fig. S8),and the 10 most divergent protein genes in Viscum (Table 1) areso divergent that we deemed it imprudent to use them for eitherphylogenetic analysis or analysis of synonymous (dS) and non-synonymous (dN) site divergence. Unusually frequent C-to-UmRNA editing in Viscum is highly unlikely to account for its highlevels of inferred protein divergence (SI Appendix, SI Discus-sion); rather, as described below, this divergence appears to bethe consequence of major changes in both mutational and se-lective pressures.Full alignments for the 10 most divergent Viscum proteins are

shown in SI Appendix, Fig. S11. These proteins are characterizedby short regions of moderate-to-high sequence similarity in-terspersed with longer regions exhibiting little to no detectablesequence similarity. These latter regions often include a numberof indels, and the reliability of the alignment in these regions isoften questionable. Their extraordinary level of divergence raisesthe question of whether these 10 “genes” code for functionalproteins. The most divergent proteins in Viscum are, for the mostpart, encoded by ORFs of moderate to considerable length(Table 1). Considering Viscum’s highly accelerated evolution,such an accumulation of nucleotide substitutions should haveintroduced premature stop codons into most if not all of theseORFs by chance. Given the frequency of stop codons in thegenome, and assuming a random distribution of point mutations,the probabilities of getting ORFs of lengths 50, 100, 150, and 200codons by chance alone are 0.14, 0.02, 0.003, and 0.0004, re-spectively. Thus, it is likely that these proteins are, by and large,functional, especially because the majority of them are of com-parable length to those found in other angiosperms (Table 1). Itshould be noted that the unprecedented level of divergence inViscum makes it difficult to rule out the possibility that un-identified protein genes exist in the mitogenome but haveevolved beyond our ability to recognize them.To assess further the level of divergence in Viscum protein

genes, dS and dN values were estimated for the nine best-con-served genes (Fig. 3, Table 1, and SI Appendix, Fig. S12). Syn-onymous substitutions in protein genes often are assumed to bemore or less free from natural selection and thus are used fre-quently to approximate the neutral mutation rate (52). Thepatterns of divergence in dS trees (Fig. 3A and SI Appendix, Fig.S12) are similar to those in the trees based on all three codonpositions (SI Appendix, Fig. S8). For almost all genes, dS branchlengths are longer for Viscum than for any of the other sampledangiosperms, including the high-rate Silene species (Fig. 3). Thisdivergence indicates a highly elevated mutation pressure operating(perhaps still so) across the mitogenome during a portion of theViscum lineage time. Estimates of the frequency and magnitude ofthis mutation-rate increase await sampling of mitogenomes from aphylogenetically broad range of Santalales taxa and the constructionof molecular chronograms for these taxa. Increased sequence di-vergence in seven sampled members of the Viscaceae (including

Skippington et al. PNAS | Published online June 22, 2015 | E3519

EVOLU

TION

PNASPL

US

Page 6: Miniaturized mitogenome of the parasitic plant …Miniaturized mitogenome of the parasitic plant Viscum scurruloideum is extremely divergent and dynamic and has lost all nad genes

Fig. 3. Elevated sequence divergence of the best-conserved mitochondrial protein genes in Viscum. (A) Phylograms of dS and dN sequence divergence for sixof the nine best-conserved genes (Table 1). All trees are shown to the same scale. The absence of the cox2 gene in Vigna is indicated with an “X.” (B) Levels ofdS and dN sequence divergence for eight protein genes in Viscum and two rapidly evolving Silene mitochondrial genomes.

E3520 | www.pnas.org/cgi/doi/10.1073/pnas.1504491112 Skippington et al.

Page 7: Miniaturized mitogenome of the parasitic plant …Miniaturized mitogenome of the parasitic plant Viscum scurruloideum is extremely divergent and dynamic and has lost all nad genes

one species of Viscum) was observed in a concatenated phylogeneticanalysis of one nuclear gene (SSU rDNA) and two plastid genes(rbcL, matK) (11). Although the magnitude of the mitochondrialelevation appears to be significantly higher than for the plastid andnuclear genes, it is possible that multiple factors underlie the mi-tochondrial situation and that one of these factors is common to allthree genomes.If mutation rates and the effects of selection on synonymous

sites were reasonably constant across the Viscum mitogenome,we would expect that synonymous substitution rates should befairly constant among its protein genes. This expectation is metfor eight of the nine best-conserved protein genes (dS = 0.68–1.06), but rps12 is a clear outlier (dS = 3.89) (Table 1; also seeSI Appendix, Fig. S12). This observation adds to the growing, andpuzzling, recognition that angiosperm mitogenomes can experi-ence substantial heterogeneity in intragenomic rates, althoughthe level found in Viscum pales against that in certain othergenomes, especially that of Ajuga (53).As expected, given the highly elevated mutation rate(s) in

Viscum mtDNA, dN branch lengths are highly elevated in Viscumrelative to other angiosperms (Fig. 3 and SI Appendix, Fig. S12).Contrary to the pattern seen for high-mutation-rate genomes inSilene (Fig. 3B) and in Plantago and Pelargonium (54), in whichdN/dS is depressed (0.04–0.07 for a concatenate of atp1, cob,cox1, cox2, and cox3, five of the nine best-conserved proteingenes in Viscum) relative to typically low-mutation angiospermgenomes, the average dN/dS value for these five genes in Viscumis 0.20 (Table 1). These different dN/dS values suggest a differentinterplay of mutational and selective forces among these in-dependently arisen, high-mutation-rate lineages. In particular, itseems likely that the set of 10 extremely divergent protein genes(SI Appendix, Fig. S11) has and/or is evolving under relaxed,if not to some extent positive, selection compared with otherangiosperms.As well as generally being very large, plant mitogenomes typi-

cally are characterized by extremely low rates of point mutation(54). In recent years, the mutational burden hypothesis, whichpredicts that mitogenome size should be negatively correlated withmutation rate (55), has been adopted by some to explain the or-igins of variation in mitogenome size. Under this hypothesis, highmutation pressure enhances natural selection against nonessentialDNA, and hence against large genomes. The reduced Viscum ge-nome and its highly elevated dS values are consistent with the hy-pothesis of genomic streamlining in response to a high mutationrate (55) but are in complete contrast to the enormous mitoge-nomes of S. noctiflora and S. conica, which have experienced bothdramatic accelerations in mitochondrial mutation rates and hugeexpansions of intergenic DNA (9). These entirely opposite findingsemphasize the need for further genome sequencing to provide abroader perspective on the remarkable genomic diversity acrossangiosperms and to enable refined hypotheses regarding the un-doubtedly multifaceted relationship among mutation rate, genomesize, and other aspects of genome evolution.

Abundant Repeats and Extreme Levels of Recombination and Sublimons.Repetitive sequences cover 26 kb (39%) of the Viscum mitoge-nome (Fig. 1B). In light of its small size and gene-dense nature, itis remarkable that large repeats (≥1 kb) cover so much of theViscum genome (16 kb, 24%), because this is a greater absoluteamount than seen in all but two of the 11 much larger (222- to983-kb) eudicot mitogenomes compared in ref. 56. Also re-markable is that repeat coverage in Viscum (39%), the smallestknown angiosperm mitogenome, is proportionally comparable tothat of the largest known mitogenome, that of S. conica (40.8%;4.6 Mb of repeat coverage in a 11.3-Mb genome) (9). In contrast,the also enormous (6.7-Mb) mitogenome of S. noctiflora—whichhas a high mutation rate—contains a relatively modest pro-portion of repetitive sequences (10.9%) compared with Viscum

and many other angiosperm mtDNAs (9). Thus, there is no strictrelationship between repeat content, genome size, and mutationrate in angiosperm mtDNAs.In angiosperm mitochondria, low-abundance substoichiometric

DNA molecules (termed “sublimons”) are often present, arising viarecombination between short repeats (57–59). Most pairs of repeatsin Viscum mtDNA are short; of those ≥30 bp in size, 97% (305 of315) are ≤100 bp in length, and 86% of these are <50 bp in length.To assess recombination at short Viscum repeats, we counted thenumber of read pairs that are consistent with the high-depth ref-erence assembly and those that are consistent with the predictedalternative configuration (AC) of the repeat-flanking single-copysequences that would result from repeat-mediated recombination.To avoid inflating the estimates of ACs, when two or more repeat-pair environments were supported by the same set of spanningreads, only one repeat pair was considered in our analysis. Wefound evidence for ACs for 127 (78%) of the 162 short-repeat pairsexamined (Fig. 4 and SI Appendix, Table S1). There is a strongcorrelation (r = 0.72, P < 0.001) between repeat length and ACfrequency (Fig. 4), but there is no correlation between repeat sim-ilarity and AC frequency (SI Appendix, Fig. S13).The AC frequency for short repeats in Viscum mtDNA is

markedly higher than in the nine other angiosperm mtDNAs forwhich AC frequency has been measured by high-depth genomesequencing and also is higher than in angiosperms in which non–genome-scale approaches have been used (59). Not even a singleAC was detected for the vast number (>300,000 and >70,000) ofshort-repeat pairs (30–100 bp in length) present in the mitoge-nomes of S. conica (11.3 Mb) and cucumber (1.7 Mb), re-spectively (9, 60). Alternative configurations were detected foronly one of the ca. 2,400 short repeats in the 6.7-Mb genome ofS. noctiflora genome (9, 60) and for only two of 110 short repeatsin the 404-kb genome of Vigna angularis (61). AC frequency wasnot reported for the 526-kb Mimulus mitogenome (62), but thenumber of AC-consistent reads was given and indicates muchlower AC frequency levels than in Viscum. As in Viscum, repeatlength correlates with AC frequency inMimulus. Finally, for fourSilene vulgaris mitogenomes (361–429 kb in size) analyzed inaggregate (63), AC frequency averaged only 0.8% (median =0.7%, range = 0.2–1.5%) for the 13 repeat pairs with a length of50–100 bp for which at least five ACs were detected. The numberof the repeats for which zero to four ACs were recovered was notreported, and therefore AC frequency again is markedly lowerthan in Viscum.Only four of the 10 repeat pairs in Viscum with a length ≥100 bp

are sufficiently short (387–593 bp) to allow ACs to be detected

Fig. 4. Alternative configurations for repeat pairs ≤100 bp in length.

Skippington et al. PNAS | Published online June 22, 2015 | E3521

EVOLU

TION

PNASPL

US

Page 8: Miniaturized mitogenome of the parasitic plant …Miniaturized mitogenome of the parasitic plant Viscum scurruloideum is extremely divergent and dynamic and has lost all nad genes

given the library-fragment size of ∼800 bp. With AC frequencies of39–58%, all four repeat pairs are essentially at recombinationalequilibrium (i.e., a 50:50 ratio of reference configurations and ACs).To our knowledge, these are the smallest repeats at recombina-tional equilibrium in plant mitogenomes. As with the short repeats,these data are in striking contrast to the extremely low level of re-combination detected for the great majority of comparably sizedrepeats in S. conica, S. noctiflora, and cucumber (9, 60). Repeats ofthis size either are not present or were not analyzed in Vigna,Mimulus, and the four S. vulgaris mitogenomes (61–63).Although the Viscum mitogenome assembles as two contigs,

the extraordinary abundance and diversity of recombinant readpairs means that this configuration is highly unlikely to exist invivo. [This consideration is quite apart from the issue of whetherany plant mitogenome assembly actually corresponds to the invivo arrangement of the genome (64–66).] Instead, within anindividual and possibly even at the level of a single mitochondrion,the Viscum genome almost certainly exists as an extremely com-plicated population of recombinationally interrelated, overlappingmolecules of a broad range of sizes and stoichiometries. Althoughthe numerous sublimons detected in this study presumably all ulti-mately arose from repeat-mediated recombination, the extent towhich this recombination is actively ongoing is unclear and almostcertainly varies among repeats, with ongoing—and also high-fre-quency/equilibrating—recombination a safe inference for only the10 large repeat pairs. In contrast, as a consequence of occasionalto frequent ongoing repeat-mediated recombination, some of thesublimons associated with the 127 short repeats almost certainlyexist in forms independent of the main genome. Indeed, many ofthe reads indicative of sublimons have polymorphisms relative tothe reference assembly, suggesting that many regions are evolvingindependently of the predominant forms of the genome. The moreabundant ACs may even represent multiple independently arisenand replicating sublimons.All 10 large (387- to 5,439-bp) repeat pairs in Viscum show

either 100% identity or, in two cases, 99.9% identity (SI Ap-pendix, Table S1). This level of identity is yet another way inwhich the highly divergent Viscum genome differs dramaticallyfrom the also divergent S. noctiflora and S. conica mitogenomes: InS. noctiflora and S. conica only a tiny fraction of the hundreds tothousands of repeats, respectively, with a length of 400–2,000 bp are100% identical (most are 75–90% identical) (9). These differencesimply much higher rates of gene conversion/concerted evolution inViscum than in these two Silene species. The greatly reduced levelsof both homologous recombination (as measured by AC frequency)and gene conversion in these two Silene genomes were hypothesizedto be potentially important factors underlying the highly elevatedmutation rates in these genomes (9). Therefore it is all the moreintriguing that, compared with these other high-mutation-ratemitogenomes, Viscum has high levels of these two mechanis-tically related recombinational processes.

Conclusion and OutlookThe mitogenome of the hemiparasitic mistletoe V. scurruloideumhas proven to be an unexpected goldmine of surprises, settingnew benchmarks for the extremes of genome size, gene content,protein divergence, and recombinational activity and/or sub-limon levels in angiosperm mtDNA. The gene repertoire of theViscum mitogenome may be so specialized to parasitic life that,unlike all other multicellular organisms investigated so far, it cansurvive without a mitochondrial-encoded respiratory complex I.Recently the intracellular parasitic fungus Rozella allomycis alsowas found to harbor a very rapidly evolving genome that lackscomplex I of the respiratory chain (29). This discovery raises thetantalizing possibility that shared genomic signatures of parasit-ism may be present in mitogenomes both within and beyond thebounds of the angiosperm clade.

The Viscum findings have begun to illuminate patterns and modesof evolution of angiosperm mitogenomes that previously were un-recognized, raising deep questions about the diversity of genelandscapes, the effect of mutational environments on genomic con-traction and expansion, and which genes, if any, are truly essential inangiosperm mitogenomes. We expect that greatly expanded sam-pling of the speciose and diverse Santalales, as well as the manyother lineages of parasitic angiosperms, will provide significant in-sights into the evolution and function of parasitism in angiosperms,uncover an increasingly fascinating world of mitochondrial diversity,and shed light on the origins of the massive amount of foreign an-giosperm mtDNA present in the Amborella mitogenome.

Materials and MethodsAssembly. The Viscum mitogenome assembly was generated using 100-bppaired-end Illumina HiSeq2000 reads (with 800-bp fragment size) from leaf-derived total genomic DNA (SNP 15591; collected in Sabah, Malaysia, Hamadon,kg. Bundu Tuhan, 1,300 m, by T.J.B., September 22, 2000). The relatively smallfraction of reads (<1%) corresponding to the mitochondrial genome wasextracted from the pool of 82.5 million total genomic reads by iterativelyextending contigs. Reads determined to be similar to known angiosperm mito-chondrial genes using BLASTN (67) were used as seeds. These reads then wereblasted against all reads, and reads with no or few high-quality base conflicts inthe aligned region and which overhung at the end were used to extend apseudo contig. The ends of nascent contigs were blasted again and extendeduntil no further extension was obtained. To identify identical or highly similarreads, megablast was used with a word size of 32. Pools of reads derived fromthese iterations were assembled using CAP3 (68) with the following parametersettings: a = 11, b = 16, d = 21, e = 11, o = 50, f = 2, h = 500, i = 60, j = 80, s =1,800, and t = 300. The majority of the contig ends corresponded to repeats thatcould be resolved by read depth, read-pair conflicts, and identification of repeatboundaries for which a large portion of reads were chimeric outside the repeatregion. Repeat sequences were identified by comparing the mitogenome toitself using ungapped BLASTN with the parameter settings xdrop_gap = 10 andxdrop_gap_final = 10. All alignments with an e-value ≤1e-7 were consideredrepeats for calculations of repeat number. The repeat map (Fig. 1B) was gen-erated with Circos (69).

The final assembly contained two contigs, a 42,186-bp contig that could berationalized into a circle based on read-pair information and a 23,687-bp kbcontig that was terminated by repeats that matched within the 42-kb contigand whose lengths could not be determined using read-depth and repeat-boundary information. In this sense, therefore, the Viscum mitogenomeassembly (Fig. 1A) is incomplete. These repeats may be relatively long andpossibly overlap with other repeats, making their boundaries difficult toidentify. The read depth adjacent to the corresponding repeat regions in the42-kb contig is too variable to make a clear judgment as to precisely wherethe terminal repeats in the 24-kb contig end. The contigs we present are wellsupported in terms of high-quality read depth and read-pair depth, with theexception of a low-depth region around 16.9 kb in the 42-kb contig (SIAppendix, Fig. S1). This region corresponds to a palindromic sequence. Therelative positions of forward- and reverse-oriented reads mapping in thisregion suggest that the palindrome may have interfered with the paired-end amplification of sequences containing this palindrome.

We looked for potentially missing mitochondrial sequences by using readsthat map to the assembly but whose read-pair mates do not (singlets). Al-though we identified many singlet reads, using their mate pairs as iterativeassembly seeds (see above) did not produce any high-depth contigs or contigsthat were part of the main mitochondrial genome. On the basis of poly-morphisms and homology to sequences in the GenBank nt/nr database, weinfer that the majority of these singlets probably correspond to mitochon-drial-like sequences in the nucleus. It is conceivable that some nongenic orhighly divergent genic regions of the genome are missing from our assem-bly, but, given its gene density (Fig. 1A) and the high number of singletpartners that did not yield mitochondrial-like contigs, this notion is unlikely.

Coverage Estimation. Nuclear genome coverage was estimated at 0.1× byusing the Lander/Waterman equation and conservatively assuming V. scur-ruloideum to have the same nuclear genome size (2C = 124.6) as Viscumminimum, which has the smallest nuclear genome of the four Viscum speciesexamined thus far (16).

Gene Annotation. Contig sequences were compared by BLASTX v. 2.2.28 (67)to mitochondrial protein sequences from 55 complete green algal and land plant

E3522 | www.pnas.org/cgi/doi/10.1073/pnas.1504491112 Skippington et al.

Page 9: Miniaturized mitogenome of the parasitic plant …Miniaturized mitogenome of the parasitic plant Viscum scurruloideum is extremely divergent and dynamic and has lost all nad genes

mitogenomes. Gene boundaries were annotated based on inspection of align-ments satisfying e-value <0.001. The locations of rRNA genes were determinedsimilarly by BLASTN searches against the rRNA gene sequences of these ge-nomes. tRNA genes were predicted using tRNAscan version 1.23 (70).

RNA Editing Sites. C-to-U RNA editing sites were predicted using PREP-Mitochondrial (71) with stringency settings (C-values) of 0.2, 0.6, and 1.0 andPREPACT v. 2.12.3 (72) with default parameter settings. Viscum results arereported (SI Appendix, Table S2) for multiple stringency settings becauseprevious analyses in Silene have suggested that different C-value settings areappropriate for species with different rates of sequence evolution (9). Allresults reported in the main text were obtained using PREP-Mitochondrialwith C-value = 0.2. The predicted editing sites probably underestimate thetrue extent of editing within Viscum, because the PREP-Mitochondrial ap-proach is blind to synonymous editing sites.

Phylogenetic Analysis. Protein sequences were extracted from GenBank andaligned using MAFFT v. 7.130b (73) with the L-INS-I option. The resulting aminoacid alignments then were computationally converted to nucleotide alignmentsusing PAL2NAL (74), with the resulting arrangement of nucleotide tripletsreflecting the protein alignment in the case of each gene set. To avoid bias inphylogenetic analysis and evolutionary rate estimation arising from C-to-URNA editing, codons known to undergo mitochondrial RNA editing in at leastone of eight angiosperms (A. thaliana, Beta vulgaris, B. napus, Citrullus lanatus,Cucurbita pepo, Oryza sativa, Silene latifolia, and Vitis vinifera) (SI Appendix,

Table S3) or predicted to undergo RNA editing in V. scurruloideum (SI Appendix,Tables S2 and S3) were excluded from all nucleotide analyses. rRNA gene se-quences were aligned similarly using MAFFT L-INS-I, but at the nucleotide level.Ambiguously aligned regions of the rRNA gene alignments were removed usingGBLOCKS version 0.91b, with the following parameter settings: t = c, b1 = 11,b2 = 16, b3 = 8, b4 = 5, and b5 = none (75). These stringent settings removelarge fractions of most alignments. All phylogenetic trees were estimated withRAxML v. 7.2.8 (76) using the generalized time-reversible (GTR) model withgamma correction for among-site rate variation and 10 starting trees. Supportfor nodes was assessed using 1,000 bootstrap replicates. The ancestral sequencesshown in SI Appendix, Figs. S9, S10, and S11 were reconstructed using basemland codeml in the PAML package (77).

Evolutionary Rate Estimation. PAML’s codeml (77) was used to estimatebranch dS and dN, using a simplified Goldman–Yang codon model withseparate branch dN/dS ratios (ω) that allowed for the following three sets ofbranches: the Viscum branch; the S. conica and S. noctiflora branches; and allremaining branches. Codon frequencies were computed using the F3 × 4method. The values for ω and transition/transversion ratio were estimatedfrom the data with start values of 0.4 and 2, respectively.

ACKNOWLEDGMENTS. We thank Andy Alverson and Virginia Sanchez-Puertafor providing data used in certain analyses and Andy Alverson and Dan Sloan forcritical reading of the manuscript. This work was supported in part by NationalScience Foundation Award IOS 1027529 (to J.D.P.).

1. Westwood JH, Yoder JI, TimkoMP, dePamphilis CW (2010) The evolution of parasitismin plants. Trends Plant Sci 15(4):227–235.

2. Barkman TJ, et al. (2007) Mitochondrial DNA suggests at least 11 origins of parasitismin angiosperms and reveals genomic chimerism in parasitic plants. BMC Evol Biol 7:248.

3. Heide-Jørgensen HS (2008) Parasitic Flowering Plants (Koninklijke Brill NV, Leiden,The Netherlands).

4. Molina J, et al. (2014) Possible loss of the chloroplast genome in the parasitic flow-ering plant Rafflesia lagascae (Rafflesiaceae). Mol Biol Evol 31(4):793–803.

5. Wicke S, et al. (2013) Mechanisms of functional and physical genome reduction inphotosynthetic and nonphotosynthetic parasitic plants of the broomrape family.Plant Cell 25(10):3711–3725.

6. Schelkunov MI, et al. (2015) Exploring the limits for reduction of plastid genomes: Acase study of the mycoheterotrophic orchids Epipogium aphyllum and Epipogiumroseum. Genome Biol Evol 7(4):1179–1191.

7. Mower JP, Jain K, Hepburn NJ (2012) The role of horizontal transfer in shaping theplant mitochondrial genome. Advances in Botanical Research, ed Maréchal-Drouard L(Academic, New York), Vol 63: Mitochondrial Genome Evolution, pp 41–69.

8. Xi Z, et al. (2013) Massive mitochondrial gene transfer in a parasitic flowering plantclade. PLoS Genet 9(2):e1003265.

9. Sloan DB, et al. (2012) Rapid evolution of enormous, multichromosomal genomes inflowering plant mitochondria with exceptionally high mutation rates. PLoS Biol 10(1):e1001241.

10. Mower JP, Sloan DB, Alverson AJ (2012) Plant mitochondrial diversity—the genomicsrevolution. Plant Genome Diversity, eds Wendel JF, Greilhuber J, Dolezel, I Leitch IJ(Springer, Berlin), pp 123–144.

11. Der JP, Nickrent DL (2008) A molecular phylogeny of Santalaceae (Santalales). Syst Bot33(1):107–116.

12. Nickrent DL, Malécot V, Vidal-Russell R, Der JP (2010) A revised classification ofSantalales. Taxon 59(2):538–558.

13. Nickrent DL, Der JP, Anderson FE (2005) Discovery of the photosynthetic relatives ofthe “Maltese mushroom” Cynomorium. BMC Evol Biol 5:38.

14. Mathiasen RL, Nickrent DL, Shaw DC, Watson DM (2008) Mistletoes: Pathology, sys-tematics, ecology, and management. Plant Dis 92(7):988–1006.

15. Rice DW, et al. (2013) Horizontal transfer of entire genomes via mitochondrial fusionin the angiosperm Amborella. Science 342(6165):1468–1473.

16. Zonneveld BJM (2010) New record holders for maximum genome size in eudicots andmonocots. J Bot, 10.1155/2010/527357.

17. Adams KL, Qiu YL, Stoutemyer M, Palmer JD (2002) Punctuated evolution of mito-chondrial gene content: High and variable rates of mitochondrial gene loss andtransfer to the nucleus during angiosperm evolution. Proc Natl Acad Sci USA 99(15):9905–9912.

18. Adams KL, Palmer JD (2003) Evolution of mitochondrial gene content: Gene loss andtransfer to the nucleus. Mol Phylogenet Evol 29(3):380–395.

19. Richardson AO, Rice DW, Young GJ, Alverson AJ, Palmer JD (2013) The “fossilized”mitochondrial genome of Liriodendron tulipifera: Ancestral gene content and order,ancestral editing sites, and extraordinarily low mutation rate. BMC Biol 11:29.

20. Kim S, Yoon MK (2010) Comparison of mitochondrial and chloroplast genome seg-ments from three onion (Allium cepa L.) cytoplasm types and identification of a trans-splicing intron of cox2. Curr Genet 56(2):177–188.

21. Hecht J, Grewe F, Knoop V (2011) Extreme RNA editing in coding islands and abun-dant microsatellites in repeat sequences of Selaginella moellendorffii mitochondria:The root of frequent plant mtDNA recombination in early tracheophytes. GenomeBiol Evol 3:344–358.

22. Wahleithner JA, MacFarlane JL, Wolstenholme DR (1990) A sequence encoding a

maturase-related protein in a group II intron of a plant mitochondrial nad1 gene.

Proc Natl Acad Sci USA 87(2):548–552.23. Brown GG, Colas des Francs-Small C, Ostersetzer-Biran O (2014) Group II intron

splicing factors in plant mitochondria. Front Plant Sci 5:35.24. Rurek M (2008) Proteins involved in maturation pathways of plant mitochondrial and

plastid c-type cytochromes. Acta Biochim Pol 55(3):417–433.25. Daley DO, Clifton R, Whelan J (2002) Intracellular gene transfer: Reduced hydro-

phobicity facilitates gene transfer for subunit 2 of cytochrome c oxidase. Proc Natl

Acad Sci USA 99(16):10510–10515.26. Daley DO, Whelan J (2005) Why genes persist in organelle genomes. Genome Biol

6(5):110.27. Saraste M (1999) Oxidative phosphorylation at the fin de siècle. Science 283(5407):

1488–1493.28. Gabaldón T, Rainey D, Huynen MA (2005) Tracing the evolution of a large protein

complex in the eukaryotes, NADH:ubiquinone oxidoreductase (Complex I). J Mol Biol

348(4):857–870.29. James TY, et al. (2013) Shared signatures of parasitism and phylogenomics unite

Cryptomycota and microsporidia. Curr Biol 23(16):1548–1553.30. Flegontov P, et al. (2015) Divergent mitochondrial respiratory chains in phototrophic

relatives of apicomplexan parasites. Mol Biol Evol 32(5):1115–1131, 10.1093/molbev/

msv021.31. Lang BF, Gray MW, Burger G (1999) Mitochondrial genome evolution and the origin

of eukaryotes. Annu Rev Genet 33:351–397.32. Popot JL, de Vitry C (1990) On the microassembly of integral membrane proteins.

Annu Rev Biophys Biophys Chem 19:369–403.33. Claros MG, et al. (1995) Limitations to in vivo import of hydrophobic proteins into

yeast mitochondria. The case of a cytoplasmically synthesized apocytochrome b. Eur J

Biochem 228(3):762–771.34. Marcet-Houben M, Marceddu G, Gabaldón T (2009) Phylogenomics of the oxidative

phosphorylation in fungi reveals extensive gene duplication followed by functional

divergence. BMC Evol Biol 9:295.35. Gutierres S, et al. (1997) Lack of mitochondrial and nuclear-encoded subunits of

complex I and alteration of the respiratory chain in Nicotiana sylvestris mitochondrial

deletion mutants. Proc Natl Acad Sci USA 94(7):3436–3441.36. Pineau B, Mathieu C, Gérard-Hirne C, De Paepe R, Chétrit P (2005) Targeting the

NAD7 subunit to mitochondria restores a functional complex I and a wild type phe-

notype in the Nicotiana sylvestris CMS II mutant lacking nad7. J Biol Chem 280(28):

25994–26001.37. Meyer EH, et al. (2009) Remodeled respiration in ndufs4 with low phosphorylation

efficiency suppresses Arabidopsis germination and growth and alters control of me-

tabolism at night. Plant Physiol 151(2):603–619.38. Karpova OV, Newton KJ (1999) A partially assembled complex I in NAD4-deficient

mitochondria of maize. Plant J 17(5):511–521.39. Marienfeld JR, Newton KJ (1994) The maize NCS2 abnormal growth mutant has a

chimeric nad4-nad7 mitochondrial gene and is associated with reduced complex I

function. Genetics 138(3):855–863.40. Sabar M, De Paepe R, de Kouchkovsky Y (2000) Complex I impairment, respiratory

compensations, and photosynthetic decrease in nuclear and mitochondrial male

sterile mutants of Nicotiana sylvestris. Plant Physiol 124(3):1239–1250.41. Rasmusson AG, Soole KL, Elthon TE (2004) Alternative NAD(P)H dehydrogenases of

plant mitochondria. Annu Rev Plant Biol 55:23–39.

Skippington et al. PNAS | Published online June 22, 2015 | E3523

EVOLU

TION

PNASPL

US

Page 10: Miniaturized mitogenome of the parasitic plant …Miniaturized mitogenome of the parasitic plant Viscum scurruloideum is extremely divergent and dynamic and has lost all nad genes

42. Yamato KT, Newton KJ (1999) Heteroplasmy and homoplasmy for maize mitochon-drial mutants: A rare homoplasmic nad4 deletion mutant plant. J Hered 90(3):369–373.

43. Noctor G, Dutilleul C, De Paepe R, Foyer CH (2004) Use of mitochondrial electrontransport mutants to evaluate the effects of redox state on photosynthesis, stresstolerance and the integration of carbon/nitrogen metabolism. J Exp Bot 55(394):49–57.

44. Dutilleul C, et al. (2003) Functional mitochondrial complex I is required by tobaccoleaves for optimal photosynthetic performance in photorespiratory conditions andduring transients. Plant Physiol 131(1):264–275.

45. Hollinger DY (1983) Photosynthesis and water relations of the mistletoe, Phoraden-dron villosum, and its host, the California calley oak, Quercus lobata. Oecologia 60(3):396–400.

46. Ehleringer JR, Cook CS, Tieszen LL (1986) Comparative water-use and nitrogen re-lationships in a mistletoe and its host. Oecologia 68(2):279–284.

47. Johnson JM, Choinski JS (1993) Photosynthesis in the Tapinanthus-Diplorhynchusmistletoe host relationship. Ann Bot (Lond) 72(2):117–122.

48. Marshall JD, Ehleringer JR, Schulze ED, Farquhar G (1994) Carbon-isotope composi-tion, gas-exchange and heterotrophy in Australian mistletoes. Funct Ecol 8(2):237–241.

49. Hao W (2010) OrgConv: Detection of gene conversion using consensus sequences andits application in plant mitochondrial and chloroplast homologs. BMC Bioinformatics11:114.

50. Cho Y, Qiu YL, Kuhlman P, Palmer JD (1998) Explosive invasion of plant mitochondriaby a group I intron. Proc Natl Acad Sci USA 95(24):14244–14249.

51. Sanchez-Puerta MV, Cho Y, Mower JP, Alverson AJ, Palmer JD (2008) Frequent,phylogenetically local horizontal transfer of the cox1 group I Intron in floweringplant mitochondria. Mol Biol Evol 25(8):1762–1777.

52. Kimura M (1984) The Neutral Theory of Molecular Evolution (Cambridge Univ Press,Cambridge, UK).

53. Zhu A, Guo W, Jain K, Mower JP (2014) Unprecedented heterogeneity in the syn-onymous substitution rate within a plant genome. Mol Biol Evol 31(5):1228–1236.

54. Mower JP, Touzet P, Gummow JS, Delph LF, Palmer JD (2007) Extensive variation insynonymous substitution rates in mitochondrial genes of seed plants. BMC Evol Biol7:135.

55. Lynch M, Koskella B, Schaack S (2006) Mutation pressure and the evolution of or-ganelle genomic architecture. Science 311(5768):1727–1730.

56. Alverson AJ, Zhuo S, Rice DW, Sloan DB, Palmer JD (2011) The mitochondrial genomeof the legume Vigna radiata and the analysis of recombination across short mito-chondrial repeats. PLoS ONE 6(1):e16404.

57. Small ID, Isaac PG, Leaver CJ (1987) Stoichiometric differences in DNA moleculescontaining the atpA gene suggest mechanisms for the generation of mitochondrialgenome diversity in maize. EMBO J 6(4):865–869.

58. Small I, Suffolk R, Leaver CJ (1989) Evolution of plant mitochondrial genomes viasubstoichiometric intermediates. Cell 58(1):69–76.

59. Woloszynska M (2010) Heteroplasmy and stoichiometric complexity of plant mito-chondrial genomes—though this be madness, yet there’s method in’t. J Exp Bot 61(3):657–671.

60. Alverson AJ, Rice DW, Dickinson S, Barry K, Palmer JD (2011) Origins and re-combination of the bacterial-sized multichromosomal mitochondrial genome of cu-cumber. Plant Cell 23(7):2499–2513.

61. Naito K, Kaga A, Tomooka N, Kawase M (2013) De novo assembly of the completeorganelle genome sequences of azuki bean (Vigna angularis) using next-generationsequencers. Breed Sci 63(2):176–182.

62. Mower JP, Case AL, Floro ER, Willis JH (2012) Evidence against equimolarity of largerepeat arrangements and a predominant master circle structure of the mitochondrialgenome from a monkeyflower (Mimulus guttatus) lineage with cryptic CMS. GenomeBiol Evol 4(5):670–686.

63. Sloan DB, Müller K, McCauley DE, Taylor DR, Storchová H (2012) Intraspecific variationin mitochondrial genome sequence, structure, and gene content in Silene vulgaris, anangiosperm with pervasive cytoplasmic male sterility. New Phytol 196(4):1228–1239.

64. Sloan DB (2013) One ring to rule them all? Genome sequencing provides new insightsinto the ‘master circle’ model of plant mitochondrial DNA structure. New Phytol200(4):978–985.

65. Bendich AJ (1993) Reaching for the ring: The study of mitochondrial genome struc-ture. Curr Genet 24(4):279–290.

66. Backert S, Börner T (2000) Phage T4-like intermediates of DNA replication and re-combination in the mitochondria of the higher plant Chenopodium album (L.). CurrGenet 37(5):304–314.

67. Camacho C, et al. (2009) BLAST+: Architecture and applications. BMC Bioinformatics10:421.

68. Huang X, Madan A (1999) CAP3: A DNA sequence assembly program. Genome Res9(9):868–877.

69. Krzywinski M, et al. (2009) Circos: An information aesthetic for comparative geno-mics. Genome Res 19(9):1639–1645.

70. Lowe TM, Eddy SR (1997) tRNAscan-SE: A program for improved detection of transferRNA genes in genomic sequence. Nucleic Acids Res 25(5):955–964.

71. Mower JP (2009) The PREP suite: Predictive RNA editors for plant mitochondrialgenes, chloroplast genes and user-defined alignments. Nucleic Acids Res 37(WebServer issue):W253-9.

72. Lenz H, Knoop V (2013) PREPACT 2.0: Predicting C-to-U and U-to-C RNA editing inorganelle genome sequences with multiple references and curated RNA editing an-notation. Bioinform Biol Insights 7:1–19.

73. Katoh K, Standley DM (2013) MAFFT multiple sequence alignment software version 7:Improvements in performance and usability. Mol Biol Evol 30(4):772–780.

74. Suyama M, Torrents D, Bork P (2006) PAL2NAL: Robust conversion of protein se-quence alignments into the corresponding codon alignments. Nucleic Acids Res34(Web Server issue):W609-12.

75. Castresana J (2000) Selection of conserved blocks from multiple alignments for theiruse in phylogenetic analysis. Mol Biol Evol 17(4):540–552.

76. Stamatakis A (2006) RAxML-VI-HPC: Maximum likelihood-based phylogenetic analy-ses with thousands of taxa and mixed models. Bioinformatics 22(21):2688–2690.

77. Yang Z (2007) PAML 4: Phylogenetic analysis by maximum likelihood. Mol Biol Evol24(8):1586–1591.

78. Stevens PF (2014) Angiosperm Phylogeny Website. Available at www.mobot.org/mobot/research/apweb/welcome.html. Accessed March 1, 2015.

E3524 | www.pnas.org/cgi/doi/10.1073/pnas.1504491112 Skippington et al.